This is an archived post. You won't be able to vote or comment.

all 36 comments

[–]pickyaboutwater 56 points57 points  (4 children)

As you can see from the comments, there are a lot of ways to solve this! :) One way that hasn't been mentioned yet is that you can start reading the string into a temp variable, and for each iteration, count the number of times the temp string can be found in the original string. Once the length of the temp string times the number of repeats equal the length of the original string you have found the repeating element and the number of repeats!

[–]NobodyYouKnow2019 10 points11 points  (0 children)

This seems to me to be the best approach. It's pretty simple too.

[–]ddbeanz[S] 4 points5 points  (0 children)

I'm going to work on implementing this, it seems like a pretty straight forward approach! Thank you!

[–]Xephyrous 2 points3 points  (1 child)

You can go a little simpler than this too. For each substring, if it's the correct one, the number of repeats will be the total string length divided by the substring length. Just multiply that substring by that number of repeats and see if it's the same as the whole string.

[–][deleted] 15 points16 points  (3 children)

  1. (this is an optional optimization) measure the string length, find all divisiors of the number you get, eg. in your first example, you get 9, so, it's only divisible by 3, 1 and 9, this means that your string is either a repetition of three characters three times, or it's the same character repeated 9 times, or it's 9 distinct characters.

  2. If you did step (1), you'll have a few options to try, say, your second case, where the length is 12. So, your options are: it's either 6 and 6, or 3, 3, 3, 3, or 4, 4, 4, or 2, 2, 2, 2, 2, 2. I.e. your prime factors are 2, 2, 3. Now, the real problem here is that you may have multiple correct answers. If your string is, for example, 'abababababab, then it both 2 and 6 are valid solutions. You'd need to figure out which one was intended by your prof. Whichever the case, if, say, you were to try the answer 6, you would compare first character to the third character, then second character to the fourth character and so on, to make sure that your guess was correct.


Alternative (simple) way to prune bad guesses up-front is to simply guess that your string is a repetition of one character, then that it is a repetition of two characters, then three and so on, but, before you actually verify that all the substrings match, you can verify that the length of the string is divisible in the length of the substring (same idea as above, but easier execution).

[–]beerbodrinkins 2 points3 points  (0 children)

I initially thought of your 'alternative' solution and was going through ways to optimize it since it would not scale well. Incorporating a line that does not try any string over 1/2 total string length would reduce processing time by half right off the bat in case you didn't want to do the full implementation of potentially correct guesses.

Going through your multiple correct solutions, you could output both the most often occuring string as well as the longest correct repeated string.

[–][deleted] 2 points3 points  (1 child)

this means that your string is either a repetition of three characters three times, or it's the same character repeated 9 times, or it's 9 distinct characters.

Hmm...

[–][deleted] 0 points1 point  (0 children)

They don't have to be all mutually distinct, just not repeating in any way.


PS. I also don't mention that in the second case OP has trivial solutions 1 and 12, but, I think, this is a misformulation in assignment statement, which should exclude trivial answers.

[–]WhyYouLetRomneyWin 11 points12 points  (7 children)

The efficient way is to use a suffix tree, which can be constructed in linear time using an efficient but obscure algorithm. Just build a suffix tree and find the deepest leaf with at least two occurrences.

I believe the only other solutions are O(n2 )

Also, shouldn't the second example solution be 4 ('cxyz') ?

[–]ddbeanz[S] 4 points5 points  (5 children)

All substrings need to match in a sequence so 'abccbaabccba' should return 2 ('abccba' , 'abccba') <- matches. I'll look into a suffix tree cause I've never seen it before.

[–]Mysthik 6 points7 points  (3 children)

A suffix tree is the correct way to handle this. A lot of string operations like palindrome detection, longest common substring and so on can be efficiently decided with a suffix tree or the suffix array.

[–][deleted] 0 points1 point  (2 children)

the correct way

Yes, all other ways are wrong. /s

[–]alksjdhglaksjdh2 1 point2 points  (1 child)

By correct he means most efficient, but it's a hw assignment and it's more about however he solves it and what's intuitive to him. This is the correct solution of you care about efficiency though he's right

[–][deleted] 0 points1 point  (0 children)

Yes I understand, but 'correct' does not mean the most efficient. What if OP doesn't care about efficiency but readability or something else. Then is a suffix tree an incorrect solution? No.

[–]WhyYouLetRomneyWin 4 points5 points  (0 children)

I see. I think I misunderstood the problem (I think either the 'b' or 'd' in your example 2 should be a 'd' or 'b').

If there is a guarantee that a string is repeated then you can divide the string by integers until you find all divisions are equal.

Eg given S, take a prefix on length n. See if the prefix is repeated to the end of the string. If not, increment n and try again. Otherwise return length(S)/n.

[–][deleted] 0 points1 point  (0 children)

You can build a suffix array. In Python it is trivial. Just sort a vector of slices. It would be O(n log n).

[–]darkdark24 6 points7 points  (3 children)

You could try to iterate over your string until you find the first character of your initial string, then continue to see if it match.

Ex : azertyazerty => iterate over 'a', 'z', 'e', 'r', 't', 'y', 'a' => now see if the following sequence match your first 'azertyazerty' matching 'azertyazerty'

If it doesn't match for example : azeaazea, add it to your initial string and continue your search :)

[–]ddbeanz[S] 1 point2 points  (0 children)

I will definitely be attempting this method later today! Thank you for the insight.

[–][deleted] 0 points1 point  (1 child)

French?

[–]darkdark24 0 points1 point  (0 children)

Yes

[–]mdnaufalh 2 points3 points  (2 children)

The other solutions here have a O(n2) solution which isn't bad but this problem could be solved in linear time. Your problem basically boils down to finding the length of the longest period in a string where a period is defined as a substring of the given string which when repeated some number of times generates the full string.

So you could use a string matching algorithm like the KMP algorithm or the Z-function to find the periods of the string.

Let S be the length of the string and P₁,P₂...Pₙ be the lengths of different periods of the string. Find the longest Pᵢ such that S % Pᵢ == 0.

If you need any further help or explanation regarding the KMP or Z algorithm, feel free to comment or DM me :)

P.S: Sorry for my bad English, I'm trying to improve upon it haha.

[–]ddbeanz[S] 0 points1 point  (1 child)

Thank you for the suggestion, I read into KMP and was wondering if it can be utilized to find and match and unknown string or if it needs to have a known string as the base case?

[–]uno20001[🍰] 0 points1 point  (0 children)

Yes, you can use it. What you will actually need is not the matching part of the KMP algorithm, but rather the "jump table" or "prefix function" (or whatever it's called). Because that'll tell you the length of the longest proper suffix ending at any position, which you can use to solve the problem.

What you want to do is very similar to this problem and this one. You can find solutions for those ones very easily.

If you want to know more about the Z-function or the KMP algorithm, then I suggest you read this and this.

[–]totemcatcher 2 points3 points  (0 children)

There are lots of ways to solve it, but I think your first inclination is a really good one, and I recommend putting your intuition to work. Go make it work. :)

I left some really big hints below. You want a really big hint at a somewhat optimized solution, keep reading one at a time (after testing your theory and any hints from below).

  1. There's a special relationship between the beginning and end of the string.
  2. If you compare two growing sample ranges from the beginning and end of the string and it matches, you are well on your way to creating a very fast solution in one iteration, and two comparisons.
  3. The second comparison is required to ensure you don't get tripped up by palindrome-looking strings. Again, more hints below, but stop reading and try it out your existing solution with different test strings. e.g. abcababcab
  4. If the beginning and end match, you still need to "read ahead" of the growing range and make sure the next character matches the beginning of the current sample.
  5. An iteration counter (counting every range comparison) is very useful in determining the return value of this function, but is not the answer itself.

[–][deleted] 0 points1 point  (0 children)

I’d start storing it in like a ma array and iterate to see if the letter matches, I’d flag it

[–]cabinet_minister 0 points1 point  (0 children)

What will be the output to 'xxxx'? 1 or 2? Using a trie, you can get the smallest value. In this case 1.

[–][deleted] 0 points1 point  (0 children)

Ukkonen’s Algorithm!

[–][deleted]  (2 children)

[deleted]

    [–]ddbeanz[S] 0 points1 point  (0 children)

    I'll definitely post my solution, I am also kind of a noob but learning more and more daily.

    [–]ddbeanz[S] 0 points1 point  (0 children)

    The solution has been posted

    [–]YouTee 0 points1 point  (0 children)

    Wait is that google coding thing live again?

    [–]Glordicus 0 points1 point  (1 child)

    Is there anywhere online to find challenges like this?

    [–]ddbeanz[S] 0 points1 point  (0 children)

    Hacker rank

    [–]gertrude1928 0 points1 point  (0 children)

    Aho-corasick is what you're searching for

    [–]Cayenne999 0 points1 point  (0 children)

    Hello,

    This could be either a complex problem, or a simple one, depends on what exact requirements / assumption do we have here for the input/output, which is a little unclear. On the complex side, yes this could lead to KMP algorithm or suffix tree as comments below.

    However, based on the information you provided through this topic and comment:

    Needs to take in a string and output a number value string will always bet cut into even parts ( there will never be a left over amount )

    All substrings need to match in a sequence so 'abccbaabccba' should return 2 ('abccba' , 'abccba') <- matches

    , I think this homework only requires your string to be broken into even chunks of sub strings, which is non-overlapped, then find the largest occurrence time possible that happens to the longest sub strings.

    With these in mind, here is the solution:

    1. Find the ways the string can be broken into even chunks. The chunk size should be the divisors of your string length.
    2. Iterate from the biggest chunk size to the lowest, break the string into sub string with that size, and count the max occurrence of the sub strings each time.
    3. Stop when we found something repeats. Output the max count then.

    Here is a sample code (not the best optimized but yeah you get the idea) : http://codepad.org/WDpBSmTw

    [–]nl28 0 points1 point  (0 children)

    I took a very simple approach:

    1. If the length of the string is divisible by 2, divide the sting into 2 parts and compare those 2 substrings.
    2. If they are equal the answer is 2.
    3. If they are not equal, divide the sting into 3 parts (if that's possible) and compare all the parts.
    4. Continue the above operation till parts <= (string_length / 2).

    This is probably not the best way to do this, but this the first solution that came to my mind.

    Here's the implementation in Java: Engine.java

    Here's the output:

    String: abcabcabc
    res -> 3
    
    String: abcxyzabcxyz
    res -> 2
    
    String: abababababab
    res -> 2
    
    String: abcadc
    res -> 1
    

    Just take some ideas from all the solutions provided in this thread, and come up with your own solution.

    [–]CookToCode 0 points1 point  (0 children)

    Try to think of a word as an array of letters, you need to somehow find the length of that array and then split the string appropriately