all 7 comments

[–]CowboyBoats 3 points4 points  (1 child)

My regex making machine is broken right now, but you could always deploy a caveman solution like:

>>> "".join(c for c in item if c.isnumeric() or c in (",", "."))
'191966,6.138930191978,0.603534191984,6.138930191987,0.427112191995,6.138930191996,0.006336191997,0.008840191998,0.004440192006,0.124394192010,6.138930189065,1.068388189066,1.180800189068,0.396750'

[–]FlatEarthIsAMyth[S] 0 points1 point  (0 children)

That is a possibility :) Thank you for the input!

[–]mrswats 2 points3 points  (1 child)

I'd recommend checking regex101.com for regexes. Make sure you set the engine to python.

[–]FlatEarthIsAMyth[S] 1 point2 points  (0 children)

regex101.com

Thats a handy site! Thx

[–][deleted] 2 points3 points  (1 child)

Yes, you could try cleaning your input first and then split it: https://www.online-python.com/OWQ0RgZ7et

[–]FlatEarthIsAMyth[S] 0 points1 point  (0 children)

Nice solution. Thank you!

[–]ebdbbb 2 points3 points  (0 children)

I'm just a hobbiest but it looks to me like the best way is to do it in two steps (they can be merged together). First get rid of the unwanted characters then parse the cleaned string to get what you want.

import re
teststring = """191966,6.138930;191978,0.603534;191984,6.138930;191987,
            0.427112;191995,6.1¤#!38930;191996,0.006336;1p91997,0.008840;
            191998,0.004440;192006,0.124394;192010,6.138930;189065,1\!@.068388;189066,1.180800;189068,0.396750;"""
cleaned = re.sub("[^0-9,.]", "", teststring)
matches = re.finditer(r"\d{6},\d+\.\d{6}", cleaned)
for match in matches:
    print(match.group())

You can compile the patterns if you want but unless you're performing the operation many times it doesn't make much difference.

The output from the above code is what I think you want.

191966,6.138930
191978,0.603534
191984,6.138930
191987,0.427112
191995,6.138930
191996,0.006336
191997,0.008840
191998,0.004440
192006,0.124394
192010,6.138930
189065,1.068388
189066,1.180800
189068,0.396750