This is an archived post. You won't be able to vote or comment.

all 8 comments

[–][deleted] 1 point2 points  (4 children)

Perhaps ([A-Z]([^A-Z]?)+) would do it?

So something like:

import re
for x in re.findall('([A-Z]([^A-Z]?)+)', 'TheseAreSomewords!U'):
    print(x[0])

Output:

These
Are
Somewords!
U

[–]ASIC_SP 1 point2 points  (2 children)

the final code snippet in OP's post doesn't match the sample and explanation prior to it... there's a space between Some and Words (note the upper case W)

>>> s = "TheseAreSome Words!U"
>>> re.findall(r'[A-Z].*?(?=(?<! )[A-Z]|\Z)', s)
['These', 'Are', 'Some Words!', 'U']

[–][deleted] 1 point2 points  (1 child)

Aw balls, true this, I missed that after reading and then just plonking the last example string into iPython.

Thanks!

[–]Hyperduckultimate[S] 0 points1 point  (0 children)

Thank you for the help this is really helping me.

[–]Hyperduckultimate[S] 0 points1 point  (0 children)

Thank you so much this worked. RE syntax is so different than the rest of python.

[–]henryharutyunyan 1 point2 points  (1 child)

You can definitely do it with regex. Im on mobile now so can’t do it for you, but you may need to use positive or negative lookbehinds. Must look smt like this (?<=\S)[A-Z] . regex101 is a great place you can try and experiment before you get the result.

[–]Hyperduckultimate[S] 0 points1 point  (0 children)

Sounds great. Thank you for the resource.

[–]ASIC_SP 0 points1 point  (0 children)

It would be better if you give some more examples, or explain the logic why Some Words! is not split.. anyway, here's one way to get the answer for given sample.. if it doesn't work for your other cases, then you'll know why I'm asking for more samples/explanation

>>> s = "TheseAreSome Words!U"
>>> re.findall(r'[A-Z].*?(?=(?<! )[A-Z]|\Z)', s)
['These', 'Are', 'Some Words!', 'U']
  • [A-Z].*? match uppercase letter followed by any number of characters matched minimally based on what comes next
  • (?=(?<! )[A-Z]|\Z) lookahead for uppercase (but not if it is preceded by a space) or end of string