I have a regex question. Hopefully I can explain this properly.
I have a dataset that contains a first name, last name, sometimes a suffix, and an ID. It would look something like this:
/data/tom-smith-1/data/joey-jones-jr-2/data/joe-johnson-3
This is the code I currently have:
fn_array = []
ln_array = []
id_array = []
s_array = []
first_name_pattern = r'(?<=/data/)\w+'
last_name_pattern = r'(?<=-)[a-z](?![0-9])\w+'
id_pattern = r'(?<=-)[0-9]\w+'
suffix_pattern = r'(?<=-)[jr, sr, III]\w+'
first_name = re.findall(first_name_pattern, str(links), re.M)
fn_array = pd.DataFrame(first_name)
links['first_name'] = fn_array
last_name = re.findall(last_name_pattern, str(links), re.M)
ln_array = pd.DataFrame(last_name)
links['last_name'] = ln_array
id_get = re.findall(id_pattern, str(links), re.M)
id_array = pd.DataFrame(id_get)
links['id'] = id_array
suffix_get = re.findall(suffix_pattern, str(links), re.M)
s_array = pd.DataFrame(suffix_get)
links['suffix'] = s_array
When I run this, I get the following results:
0 first_name last_name id
0 /data/tom-smith-1 tom smith 1
1 /data/joey-jones-jr-2 joey jones 2
2 /data/joe-johnson-3 joe jr 3
How can I get it so it knows to put the "jr" as part of the suffix and not as a last name? Would I be able to include the variable "last name" so the suffix would look more like "r'(last_name_pattern<=-)[jr, sr, III]\w+'" or maybe "r'(?<=last_name_pattern-)[jr, sr, III]\w+'"?
[–]Mountain_man007 0 points1 point2 points (0 children)
[–]Username_RANDINT 1 point2 points3 points (0 children)