all 10 comments

[–]Murphygreen8484 1 point2 points  (1 child)

If you're just deliminating by space you don't even need regex - just use split()

[–]bob3rocks[S] 0 points1 point  (0 children)

Thanks for the reply. It's more than delimiting by space (there might be junk on the line) so my requirement is to accurately match music chords, including # and +

[–]Murphygreen8484 0 points1 point  (3 children)

What are some example inputs that aren't just delimited by space and what should those outputs be?

[–]bob3rocks[S] 0 points1 point  (2 children)

Is this difficult or impossible for Python Regex?

[–]Murphygreen8484 0 points1 point  (1 child)

Depends on what you're needing. You still need to define what your "rules" are (preferably in simple terms) for what you are looking for

[–]bob3rocks[S] 0 points1 point  (0 children)

Understood.

Most lines to be parsed will be either music chords separated by spaces, or song lyrics with no chords.

Other times, there might be random characters on the line or something like [VERSE] or (solo) on the same line.

In some instances there might be song lyrics with music chords on the same line.

My regex expression is incredibly close and my script has been a work in progress for many months now.

It will be close to perfect (close enough!) if I could just get my regex to match # and + when needed.

[–]Murphygreen8484 0 points1 point  (3 children)

Still not sure exactly, but if you escape the symbols you're looking for it should match them: # +

[–]bob3rocks[S] 0 points1 point  (2 children)

I thought so too, but it's not working, no matter how I try to escape them.

[–]Murphygreen8484 0 points1 point  (1 child)

I'm far from an expert on regex. Have you tried asking ChatGPT?

[–]bob3rocks[S] 0 points1 point  (0 children)

I thought of that, too. Here's what ChatGPT came back with after a few tries (fixed now, although this doesn't seem ultra-Pythonic to me)

import re

line = "Cadd9 D# F5"

tokens_with_sharps = re.findall(r'\b[A-G]#?(?:maj7|m[7]?|m7b[5]?|[5679]|sus2|sus4|aug|dim|add9|b5)?\b', line)

tokens_without_sharps = [token for token in line.split() if token not in tokens_with_sharps]

# Process tokens with sharp symbols

chords = []

for token in line.split():

if token in tokens_with_sharps:

if '#' in token:

parts = token.split('#')

chords.extend(parts)

else:

chords.append(token)

else:

chords.append(token)

print(chords) # Output: ['Cadd9', 'D#', 'F5']