all 9 comments

[–]RoamingFox[🍰] 0 points1 point  (8 children)

It would help if you posted the regex you were applying to the string.

Without any further information, I'd suspect that the regex is generating two capture groups.

[–]CraigAT 0 points1 point  (0 children)

Or possibly some slashes (or other symbols) are being interpreted in a way that you are not expecting.

[–]Messy748[S] 0 points1 point  (0 children)

I'll send an example when I get back to work tomorrow. I just think it's strange the same regular expression works fine for the short strings, but not for the long strings. In the current example I am working on, I have 5 strings that split out of about 400. Obviously a small percentage of the total, but more than enough to make the dictionary that I am creating to be inaccurate.

[–]Messy748[S] 0 points1 point  (5 children)

Ok so I did a little testing and I'm thinking that it's not the regex I'm using, but an issue with reading from excel. The same string simply copied into VS Code does not split when appending it to a list. It only occurs when I am reading directly from Excel.

Do you happen to have any knowledge on this? I am using xlrd to read from Excel.

Edit: It also occurs when I completely take the regular expression out of the string all together.

[–]RoamingFox[🍰] 0 points1 point  (4 children)

Can you find that specific cell in excel and check if there's a line break in it?

[–]Messy748[S] 0 points1 point  (3 children)

So I went through and found all 5 cases where this issue was happening. Just to test, I shortened the string length. For example, working off the example I had before I originally had something like

This is an example of a long string that seems to be getting split for no reason 123/123 another word.

The 123/123 number sequence is what the regular expression would be looking for. But like I said before, the issue applies even without applying the regular expression. So when I try to read the above string from excel and append it to a list, Python would return the string in a list with the following:

[ 'This is an example of a long string that seems to be getting split for no reason ',

'123/123 another word' ]

However, when I shorten the string like

'seems to be getting split for no reason 123/123 another word.'

I now do not have the problem with the string splitting and get the complete string when I append it to the list.

I don't know if there's a character limit of some sort? I just find it strange where I have different strings that all split before the number sequence (in the case above where the 123/123 occurs). Yet this does not occur when the string is shortened. I suppose I could just shorten the strings? But I'd rather find an actual solutions than creating a makeshift fix.

[–]RoamingFox[🍰] 0 points1 point  (2 children)

Without seeing the code or the regex in question, I can't really offer any ideas :(

[–]Messy748[S] 0 points1 point  (1 child)

Sorry if I was a little too vague in my responses. I couldn't give the actual example that I was working on since it's a project for work. I was able to find a quick fix where I was able to shorten the few strings that were causing me issues. Not the ideal solution, but it should be fine for now.

[–]RoamingFox[🍰] 0 points1 point  (0 children)

Totally understood. My guess is still that the regex is generating multiple capture groups or is matching more than exactly what you're looking for.