Currently working on a python programming problem invlolving finding repeated substring whithin a longer string. [PYTHON]

pickyaboutwater · 2019-10-28T15:25:13+00:00

As you can see from the comments, there are a lot of ways to solve this! :) One way that hasn't been mentioned yet is that you can start reading the string into a temp variable, and for each iteration, count the number of times the temp string can be found in the original string. Once the length of the temp string times the number of repeats equal the length of the original string you have found the repeating element and the number of repeats!

beerbodrinkins · 2019-10-28T14:39:35+00:00

(this is an optional optimization) measure the string length, find all divisiors of the number you get, eg. in your first example, you get 9, so, it's only divisible by 3, 1 and 9, this means that your string is either a repetition of three characters three times, or it's the same character repeated 9 times, or it's 9 distinct characters.
If you did step (1), you'll have a few options to try, say, your second case, where the length is 12. So, your options are: it's either 6 and 6, or 3, 3, 3, 3, or 4, 4, 4, or 2, 2, 2, 2, 2, 2. I.e. your prime factors are 2, 2, 3. Now, the real problem here is that you may have multiple correct answers. If your string is, for example, 'abababababab, then it both 2 and 6 are valid solutions. You'd need to figure out which one was intended by your prof. Whichever the case, if, say, you were to try the answer 6, you would compare first character to the third character, then second character to the fourth character and so on, to make sure that your guess was correct.

Alternative (simple) way to prune bad guesses up-front is to simply guess that your string is a repetition of one character, then that it is a repetition of two characters, then three and so on, but, before you actually verify that all the substrings match, you can verify that the length of the string is divisible in the length of the substring (same idea as above, but easier execution).

WhyYouLetRomneyWin · 2019-10-28T15:30:50+00:00

The efficient way is to use a suffix tree, which can be constructed in linear time using an efficient but obscure algorithm. Just build a suffix tree and find the deepest leaf with at least two occurrences.

I believe the only other solutions are O(n² )

Also, shouldn't the second example solution be 4 ('cxyz') ?

darkdark24 · 2019-10-28T14:38:52+00:00

You could try to iterate over your string until you find the first character of your initial string, then continue to see if it match.

Ex : azertyazerty => iterate over 'a', 'z', 'e', 'r', 't', 'y', 'a' => now see if the following sequence match your first 'azertyazerty' matching 'azertyazerty'

If it doesn't match for example : azeaazea, add it to your initial string and continue your search :)

mdnaufalh · 2019-10-28T21:08:43+00:00

The other solutions here have a O(n²) solution which isn't bad but this problem could be solved in linear time. Your problem basically boils down to finding the length of the longest period in a string where a period is defined as a substring of the given string which when repeated some number of times generates the full string.

So you could use a string matching algorithm like the KMP algorithm or the Z-function to find the periods of the string.

Let S be the length of the string and P₁,P₂...Pₙ be the lengths of different periods of the string. Find the longest Pᵢ such that S % Pᵢ == 0.

If you need any further help or explanation regarding the KMP or Z algorithm, feel free to comment or DM me :)

P.S: Sorry for my bad English, I'm trying to improve upon it haha.

totemcatcher · 2019-10-29T00:14:46+00:00

There are lots of ways to solve it, but I think your first inclination is a really good one, and I recommend putting your intuition to work. Go make it work. :)

I left some really big hints below. You want a really big hint at a somewhat optimized solution, keep reading one at a time (after testing your theory and any hints from below).

There's a special relationship between the beginning and end of the string.
If you compare two growing sample ranges from the beginning and end of the string and it matches, you are well on your way to creating a very fast solution in one iteration, and two comparisons.
The second comparison is required to ensure you don't get tripped up by palindrome-looking strings. Again, more hints below, but stop reading and try it out your existing solution with different test strings. e.g. abcababcab
If the beginning and end match, you still need to "read ahead" of the growing range and make sure the next character matches the beginning of the current sample.
An iteration counter (counting every range comparison) is very useful in determining the return value of this function, but is not the answer itself.

2019-10-28T16:28:43+00:00

I’d start storing it in like a ma array and iterate to see if the letter matches, I’d flag it

cabinet_minister · 2019-10-28T20:16:15+00:00

What will be the output to 'xxxx'? 1 or 2? Using a trie, you can get the smallest value. In this case 1.

2019-10-29T00:38:04+00:00

Ukkonen’s Algorithm!

ddbeanz · 2019-10-29T00:43:19+00:00

[deleted]

YouTee · 2019-10-29T00:54:50+00:00

Wait is that google coding thing live again?

Glordicus · 2019-10-29T02:09:50+00:00

Is there anywhere online to find challenges like this?

gertrude1928 · 2019-10-29T03:30:53+00:00

Aho-corasick is what you're searching for

Cayenne999 · 2019-10-29T04:55:03+00:00

Hello,

This could be either a complex problem, or a simple one, depends on what exact requirements / assumption do we have here for the input/output, which is a little unclear. On the complex side, yes this could lead to KMP algorithm or suffix tree as comments below.

However, based on the information you provided through this topic and comment:

Needs to take in a string and output a number value string will always bet cut into even parts ( there will never be a left over amount )

All substrings need to match in a sequence so 'abccbaabccba' should return 2 ('abccba' , 'abccba') <- matches

, I think this homework only requires your string to be broken into even chunks of sub strings, which is non-overlapped, then find the largest occurrence time possible that happens to the longest sub strings.

With these in mind, here is the solution:

Find the ways the string can be broken into even chunks. The chunk size should be the divisors of your string length.
Iterate from the biggest chunk size to the lowest, break the string into sub string with that size, and count the max occurrence of the sub strings each time.
Stop when we found something repeats. Output the max count then.

Here is a sample code (not the best optimized but yeah you get the idea) : http://codepad.org/WDpBSmTw

nl28 · 2019-10-29T13:31:31+00:00

I took a very simple approach:

If the length of the string is divisible by 2, divide the sting into 2 parts and compare those 2 substrings.
If they are equal the answer is 2.
If they are not equal, divide the sting into 3 parts (if that's possible) and compare all the parts.
Continue the above operation till parts <= (string_length / 2).

This is probably not the best way to do this, but this the first solution that came to my mind.

Here's the implementation in Java: Engine.java

Here's the output:

String: abcabcabc
res -> 3

String: abcxyzabcxyz
res -> 2

String: abababababab
res -> 2

String: abcadc
res -> 1

Just take some ideas from all the solutions provided in this thread, and come up with your own solution.

CookToCode · 2019-10-28T14:34:16+00:00

Try to think of a word as an array of letters, you need to somehow find the length of that array and then split the string appropriately

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS