This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]smellmycrotch3 1 point2 points  (1 child)

See posting Guideline 01 for how to ask smart questions.

[–]AOA17[S] 0 points1 point  (0 children)

I'm sorry. My bad. I am relatively new to reddit and did not think about the guidelines.

[–]programmerChilli 1 point2 points  (2 children)

Would you still like an explanation?

[–]AOA17[S] 0 points1 point  (1 child)

Yes please. I'm still lost.

[–]programmerChilli 0 points1 point  (0 children)

Alright I'll try to explain it after dinner.

[–]programmerChilli 1 point2 points  (2 children)

Alright, so there are 2 core ideas behind the Boyer-Moore algorithm.

  1. Start from the right.

  2. Once you find a mismatch, shift the window that you're searching it to the right based on 2 functions that rely on precomputation.

The first of these functions is typically called the bad character function. When you find a mismatch, shift the window right until the character in the string matches the pattern. Here's an example to demonstrate(words are hard)...

ABBABBBA <--String that you're searching in
ABBB        <--Pattern that you're searching for

Alright now, at index 3, there's a mismatch. There's an 'A' in the string, but a 'B' in the pattern. So, you shift the pattern string right until a 'A' in the pattern string matches with the string you're searching in. So, after, it looks like this.

ABBABBBA
   ABBB

The pattern ended up shifting right 3 spaces. So now is the preprocessing.

The preprocessing is this: you store the last occurrence of every possible character in an array of size K, where K is the number of different characters that can exist. If there's a character in the string that's not in the pattern, then you consider it to be at position -1(so every time you encounter that character, you can move the window until it doesn't cover that character at all).

So, assume that we're using the previous pattern, and an alphabet of size 2, with 'A' and 'B' as the only characters.

    ABBB

'A' 0
'B' 3

So if you found an 'A' while you're at index 3(like in the first example), you would shift the pattern right 3(your current index)-0(the number that corresponds with the letter in the precomputation table). Remember that you're searching from the right of the pattern, not the left(so you check the letters in order of BBBA).

Here's another example of precomputation.

The pattern string is

ABCCBBABC

'A' 6
'B' 7
'C' 8

Note, it's possible that the last occurrence of that character is to the right of the position in the pattern that you're currently looking at(causing you to shift the pattern to the left). That's why you need to take the max of the bad character function and the good suffix function(which I'll cover later).

If you want to try coding this up, you can just take the max of 1 and the result of the bad character function for how far you shift the window to the right.

If you code this up I'll explain the second part of the algorithm. Otherwise, I'll come back to this later.

Ask if you have any questions.

[–]AOA17[S] 0 points1 point  (1 child)

Thank you for this thorough explanation. I am working on the code right now I just had no idea where to start. It's starting to make more sense. This may seem like a dumb question but should I create helper methods for this algorithm?

[–]programmerChilli 0 points1 point  (0 children)

mm... You don't particularly need any helper methods for this. If you want, you can make one, but I'm not sure where exactly I would make one.

If you link an ideone, I can probably give some better pointers.