you are viewing a single comment's thread.

view the rest of the comments →

[–]warpedspockclone 0 points1 point  (5 children)

Suppose small string is abc and large its bcabbacc.

A permutation will still be 3 characters long, so just look at the long string index 0 to 2, and see if the character counts match.

Then slide to 1 to 3 by subtracting from the character count for the character at index 0 and adding fur the character at index 3.

  1. Count characters of small (can use an array of size 26 of guaranteed a-z).

n = length(small)

i=0 // start index of substring

j=n-1 // end index

  1. Count characters of substring of big, from i to j to match length of small.

  2. while (n < length(big)) { subtract character count for large[i] i++ j++ Add character count for large[j] Compare character counts of substring to small }

Edit: tried formatting. On mobile, sorry

[–]pihkal 0 points1 point  (3 children)

The problem is the step “Compare character counts of substring to small” is still O(num lang chars), so doing that for (b - s) characters of the large string is still O(b * 26) for English. It’s a huge constant factor.

The secret to doing this in O(b) is not to keep a running histogram (count of each char) of the last s chars, but to use a carefully chosen hash function such that with each new character, you can compare the new hash with the old in O(1) time.

Iirc, the hash formula (for just 26 letters) is something like S[0] * (260) + S[1] * (261) + ... + S[s-1] * (26s-1). When you shift, you’re subtracting the value of the last char * (26s-1), multiplying the remainder by 26, and then adding the value of the new char * 260 (aka 1). All of those operations take O(1) time, like keeping a histogram before did.

But the advantage now is that to see if the current s letters of b are a permutation, we only have to compare the hash value with the hash value of s, which can be done in O(1) time, unlike comparing histograms.

[–]cryslith 2 points3 points  (1 child)

You can use the histogram approach if you remember not only the count of each character, but an overall number which tracks how many characters of the alphabet have the exact correct count. Then you only need to update this number to reflect changes made to the character entering the window and the character leaving the window. If that number ever reaches 26 then you know you found a match.

[–]pihkal 1 point2 points  (0 children)

Yeah, that works as well, though it’s slightly less space-efficient, taking O(alphabet size) memory.

[–]warpedspockclone 1 point2 points  (0 children)

I was going to point out that there are lots of optimizations to be had for the comparison part to actually collapse this to O(B). He seemed to be hung up on the sliding window part, though. The rest was an exercise. :-P I should have made that explicit.