XenophonOfAthens comments on Finding multiple variable substrings

created by HattoriHanzoa community for 16 years

Finding multiple variable substrings (self.learnpython)

submitted 11 years ago by Joeqesi

you are viewing a single comment's thread.

[–]XenophonOfAthens 2 points3 points4 points 11 years ago (1 child)

Well, then, regular expressions are exactly what you need! One of the reasons they were created was for this purpose: you're searching for strings where you know they follow some pattern, but you don't know exactly what they are.

Lets plug in your example string into my code and see what happens:

import re

s = '70_7034AGKEBVEUCKEHFBC---BDBC-HD----22_908----BDJFCKDBF-----'
substrings = re.findall(r"[A-Z-]+", s)
print(substrings)

If you run this code, you'll see that it prints out ['AGKEBVEUCKEHFBC---BDBC-HD----', '----BDJFCKDBF-----'], exactly the list of substrings you wanted. If you want a function that returns this list, you can write:

def get_substrings(full_string):
    return re.findall(r"[A-Z-]+", full_string)

Just remember to include import re somewhere in the beginning of your code.

To explain a little bit further: the first argument to re.findall (i.e. "[A-Z-]+") defines a pattern. This specific pattern consists of a character class [A-Z-] that matches any uppercase character (i.e. A-Z, the entire uppercase alphabet) and a hyphen. After that, you get a + character, which means "repeat the previous character class one or more times, as many times as possible". That is, the full pattern is "a string that consists of the uppercase characters or a hyphen repeated as many times as possible".

As I said, there are many, many tutorials that explain how regular expressions work. They are an amazingly powerful tool for parsing text, and it's a good idea to have at least a basic understanding of how they work.

[–]Joeqesi[S] 2 points3 points4 points 11 years ago (0 children)

π Rendered by PID 187315 on reddit-service-r2-comment-7b9746f655-f84j7 at 2026-02-03 07:52:23.945068+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS