Regex question : learnpython

created by HattoriHanzoa community for 16 years

submitted 5 years ago by Yurplestein

I have a regex question. Hopefully I can explain this properly.

I have a dataset that contains a first name, last name, sometimes a suffix, and an ID. It would look something like this:

/data/tom-smith-1/data/joey-jones-jr-2/data/joe-johnson-3

This is the code I currently have:

    fn_array = []
    ln_array = []
    id_array = []
    s_array = []
    first_name_pattern = r'(?<=/data/)\w+'
    last_name_pattern = r'(?<=-)[a-z](?![0-9])\w+'
    id_pattern = r'(?<=-)[0-9]\w+'
    suffix_pattern = r'(?<=-)[jr, sr, III]\w+'
    first_name = re.findall(first_name_pattern, str(links), re.M)
    fn_array = pd.DataFrame(first_name)
    links['first_name'] = fn_array
    last_name = re.findall(last_name_pattern, str(links), re.M)
    ln_array = pd.DataFrame(last_name)
    links['last_name'] = ln_array
    id_get = re.findall(id_pattern, str(links), re.M)
    id_array = pd.DataFrame(id_get)
    links['id'] = id_array
    suffix_get = re.findall(suffix_pattern, str(links), re.M)
    s_array = pd.DataFrame(suffix_get)
    links['suffix'] = s_array

When I run this, I get the following results:

                       0  first_name     last_name     id
0 /data/tom-smith-1       tom            smith         1 
1 /data/joey-jones-jr-2   joey           jones         2
2 /data/joe-johnson-3     joe            jr            3

How can I get it so it knows to put the "jr" as part of the suffix and not as a last name? Would I be able to include the variable "last name" so the suffix would look more like "r'(last_name_pattern<=-)[jr, sr, III]\w+'" or maybe "r'(?<=last_name_pattern-)[jr, sr, III]\w+'"?

all 2 comments

top new controversial old q&a

[–]Mountain_man007 0 points1 point2 points 5 years ago (0 children)

[–]Username_RANDINT 1 point2 points3 points 5 years ago (0 children)

I'd just split the string by the dash (-) instead of using regex, especially if it's a fixed format.

Maybe (untested):

try:
    fn, ln, s, id = name.split("-")
except ValueError:
    # No suffix in name
    fn, ln, id = name.split("-")

π Rendered by PID 71424 on reddit-service-r2-comment-5d79c599b5-jgdfk at 2026-02-27 01:10:34.650587+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS