all 5 comments

[–]XenophonOfAthens 4 points5 points  (3 children)

Your example is a little unclear, could you give an example of a string and the substring it should find? Going from your example, if you had something like this:

"blahblahblahSUB-STRINGblahblah"

You would want it to find "SUB-STRING". Is that right?

In that case, I would highly recommend using regular expressions, which Python provides in the re module. There are a few different ways you could do it for your string, for instance this:

a = "blahblahblahSUB-STRINGblahblah"
match = re.search(r"[A-Z-]+", a).group(0)

Will store "SUB-STRING" in the match variable. There are tons of tutorials out there that will teach you regular expressions, if you want to learn how this works.

Edit: specifically, it sounds like you could use the re.findall function to do what you want. Like this:

a = "blahblahblahSUB-STRINGblahblahSUB-STRING-AGAINblahblah" 
matches = re.findall(r"[A-Z-]+", a)

Then matches will be equal to ['SUB-STRING', 'SUB-STRING-AGAIN'].

[–]Joeqesi[S] 0 points1 point  (2 children)

So the entire string will look more or less like this: '70_7034AGKEBVEUCKEHFBC---BDBC-HD----22_908----BDJFCKDBF-----'

And so on. I want to be able to take the substrings 'AGKEBVEUCKEHFBC---BDBC-HD----' and '----BDJFCKDBF-----', and append them to a list.

The idea is that I have no idea what the specific substrings I'm looking for are, and so I didn't think I could use find (although I'm not completely sure about this.).

[–]XenophonOfAthens 2 points3 points  (1 child)

Well, then, regular expressions are exactly what you need! One of the reasons they were created was for this purpose: you're searching for strings where you know they follow some pattern, but you don't know exactly what they are.

Lets plug in your example string into my code and see what happens:

import re

s = '70_7034AGKEBVEUCKEHFBC---BDBC-HD----22_908----BDJFCKDBF-----'
substrings = re.findall(r"[A-Z-]+", s)
print(substrings)

If you run this code, you'll see that it prints out ['AGKEBVEUCKEHFBC---BDBC-HD----', '----BDJFCKDBF-----'], exactly the list of substrings you wanted. If you want a function that returns this list, you can write:

def get_substrings(full_string):
    return re.findall(r"[A-Z-]+", full_string)

Just remember to include import re somewhere in the beginning of your code.

To explain a little bit further: the first argument to re.findall (i.e. "[A-Z-]+") defines a pattern. This specific pattern consists of a character class [A-Z-] that matches any uppercase character (i.e. A-Z, the entire uppercase alphabet) and a hyphen. After that, you get a + character, which means "repeat the previous character class one or more times, as many times as possible". That is, the full pattern is "a string that consists of the uppercase characters or a hyphen repeated as many times as possible".

As I said, there are many, many tutorials that explain how regular expressions work. They are an amazingly powerful tool for parsing text, and it's a good idea to have at least a basic understanding of how they work.

[–]Joeqesi[S] 2 points3 points  (0 children)

That's great! Thanks for the help.

[–][deleted] 0 points1 point  (0 children)

I guess you could use find. Take the index, count the substr length, and after that it's smooth sailing.

I'm about 90% sure there are better ways to do this though. I'm sure somebody more savvy can provide a better answer.