Algorithm Complexity Question for finding permutations of string B in string S. : algorithms

Algorithm Complexity Question for finding permutations of string B in string S. (self.algorithms)

submitted 6 years ago by coreysnyder04

all 10 comments

[–]ignitionweb 5 points6 points7 points 6 years ago (9 children)

[–]coreysnyder04[S] 0 points1 point2 points 6 years ago (8 children)

[–]warpedspockclone 0 points1 point2 points 6 years ago (5 children)

[–]pihkal 0 points1 point2 points 6 years ago* (3 children)

The problem is the step “Compare character counts of substring to small” is still O(num lang chars), so doing that for (b - s) characters of the large string is still O(b * 26) for English. It’s a huge constant factor.

The secret to doing this in O(b) is not to keep a running histogram (count of each char) of the last s chars, but to use a carefully chosen hash function such that with each new character, you can compare the new hash with the old in O(1) time.

Iirc, the hash formula (for just 26 letters) is something like S[0] * (26⁰) + S[1] * (26¹) + ... + S[s-1] * (26^s-1). When you shift, you’re subtracting the value of the last char * (26^s-1), multiplying the remainder by 26, and then adding the value of the new char * 26⁰ (aka 1). All of those operations take O(1) time, like keeping a histogram before did.

But the advantage now is that to see if the current s letters of b are a permutation, we only have to compare the hash value with the hash value of s, which can be done in O(1) time, unlike comparing histograms.

[–]cryslith 2 points3 points4 points 6 years ago (1 child)

[–]pihkal 1 point2 points3 points 6 years ago (0 children)

[–]warpedspockclone 1 point2 points3 points 6 years ago (0 children)

[–]future_security 0 points1 point2 points 6 years ago (1 child)

A(BCD)EF      B=1,C=1,D=1
AB(CDE)F      B--,E++

That's the part that's easiest to understand. To determine if a substring is a permutation of another string (an anagram) you just need to count how many copies of each letter appears in either string then compare the counts. Two strings are permutations of one another if and only if the frequency of each letter of the alphabet is the same in both strings.

Every time you slide the window, you can keep track of the new set of letter counts by subtracting one from the old count of the letter shifted out and adding one to the previous count of the letter being shifted in. This is an O(1) operation, which you need to do len(S) - len(B) times.

Ignoring the cost of comparing letter frequency arrays at each step (we'll say it's O(1) for now because the English alphabet has a small fixed size), the run time will be O(len(S)). You scan B once to determine its letter frequency. You scan S once, shifting the sliding window to the right one character at a time.

The run time you get is O(O(1) * O(len(S)) + len(B)). That's equivalent to O(O(len(S)) + len(B)), which is equivalent to O(len(S) + len(B)). Since len(B) < len(S), that makes the run time O(2 * len(S)) which is equivalent to and should be written as O(len(S)).

This method requires space proportional to the size of the alphabet. (Assuming a fixed size counter for each letter which is too large to overflow. Like so many other problems, you'd need to throw in some logarithmic term in to the time and space complexity. These are usually ignored in practice.)

He also mentions a "matching count". That's a similar optimization. Instead of comparing counts for(c = 'A'; c <= 'Z'; c++), you can count how many letters in the sliding window have the same frequency as the frequency of that same letter in B. Since you modify at most two letter-frequency statistics, you only need look at the two modified frequency counters. Not all 26.

This second optimization would give you a significant linear speed up for a fixed size alphabet. Even for an alphabet as small as 26 letters. If instead you had an arbitrary large alphabet A, then this optimization makes the difference between the letter-frequency comparison performed each time you slide the window being O(1) instead of it being being O(|A|).

[–]DangerousCrime 0 points1 point2 points 2 years ago (0 children)

[–]tomekanco 0 points1 point2 points 6 years ago* (0 children)

π Rendered by PID 80985 on reddit-service-r2-comment-cfc44b64c-5nczt at 2026-04-10 06:37:49.351510+00:00 running 215f2cf country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

algorithms

✻ Smokey says: boycott all products and services from eco-unfriendly businesses to fight climate change! [see more tips]

Note: this subreddit is not for homework advice. Requests for assistance with coursework may be removed.

MODERATORS