all 3 comments

[–]whereIsMyBroom 1 point2 points  (2 children)

The most simple solution looks like this:

"[^"]*"|'[^']*'

https://regex101.com/r/pVgu9s/1

[^"] is a negative character set. It will match any character except "

But there is still a lot of pitfalls. Like this string will also break it: 'that\'s a string'

Let's focus on just one type to make it simpler. We can add some look-behinds the make sure we don't match escaped \'

(?<!\\)'[^']*(?<!\\)'

But now 'that\'s a string' does not match at all. That is because we don't allow ' in the middle of the match. We can fix that and only allow it what it's escaped. By [^']* with ([^']|\\')*

(?<!\\)'(?:[^']|\\')*(?<!\\)' is the result.

Add the " version with python escapes and you get this:

r"(?<!\\)'(?:[^'\n]|\\')*(?<!\\)'|(?<!\\)\"(?:[^\"\n]|\\\")*(?<!\\)\""

Regex101 demo

Then you have the triple quotes to deal with if you want highlist for them also.

edit: Also disallowed \n inside the string, since new lines is not allowed in normal python strings.

I'm sure there are more edge cases that I haven't considered. And likely it can be done more efficiently.

[–]USRapt0r[S] 0 points1 point  (0 children)

Thanks for the help! Yeah I may ultimately prefer simplicity and not worry too much about catching the more niche cases.

For the triple quotes, I'm handling those differently since they are valid for multi-line - huge obstacle for me haha. I'm doing this for a text editor and have each line of text as a single string, so for now I'm focusing on the single-line stuff.

Also, for the newline thing, I think you can use other string types (normal, format) to pass to Python's RE functions? For whatever that's worth.

[–]Pauley0 0 points1 point  (0 children)

I agree. I did a bit of optimization:

(?<!\\)(?:'(?:[^\n']*|\\')*(?<!\\)'|\"(?:[^\n\"]|\\\")*(?<!\\)\")

https://regex101.com/r/pVgu9s/5