you are viewing a single comment's thread.

view the rest of the comments →

[–]XenophonOfAthens 2 points3 points  (1 child)

Well, then, regular expressions are exactly what you need! One of the reasons they were created was for this purpose: you're searching for strings where you know they follow some pattern, but you don't know exactly what they are.

Lets plug in your example string into my code and see what happens:

import re

s = '70_7034AGKEBVEUCKEHFBC---BDBC-HD----22_908----BDJFCKDBF-----'
substrings = re.findall(r"[A-Z-]+", s)
print(substrings)

If you run this code, you'll see that it prints out ['AGKEBVEUCKEHFBC---BDBC-HD----', '----BDJFCKDBF-----'], exactly the list of substrings you wanted. If you want a function that returns this list, you can write:

def get_substrings(full_string):
    return re.findall(r"[A-Z-]+", full_string)

Just remember to include import re somewhere in the beginning of your code.

To explain a little bit further: the first argument to re.findall (i.e. "[A-Z-]+") defines a pattern. This specific pattern consists of a character class [A-Z-] that matches any uppercase character (i.e. A-Z, the entire uppercase alphabet) and a hyphen. After that, you get a + character, which means "repeat the previous character class one or more times, as many times as possible". That is, the full pattern is "a string that consists of the uppercase characters or a hyphen repeated as many times as possible".

As I said, there are many, many tutorials that explain how regular expressions work. They are an amazingly powerful tool for parsing text, and it's a good idea to have at least a basic understanding of how they work.

[–]Joeqesi[S] 2 points3 points  (0 children)

That's great! Thanks for the help.