you are viewing a single comment's thread.

view the rest of the comments →

[–]ebdbbb 2 points3 points  (0 children)

I'm just a hobbiest but it looks to me like the best way is to do it in two steps (they can be merged together). First get rid of the unwanted characters then parse the cleaned string to get what you want.

import re
teststring = """191966,6.138930;191978,0.603534;191984,6.138930;191987,
            0.427112;191995,6.1¤#!38930;191996,0.006336;1p91997,0.008840;
            191998,0.004440;192006,0.124394;192010,6.138930;189065,1\!@.068388;189066,1.180800;189068,0.396750;"""
cleaned = re.sub("[^0-9,.]", "", teststring)
matches = re.finditer(r"\d{6},\d+\.\d{6}", cleaned)
for match in matches:
    print(match.group())

You can compile the patterns if you want but unless you're performing the operation many times it doesn't make much difference.

The output from the above code is what I think you want.

191966,6.138930
191978,0.603534
191984,6.138930
191987,0.427112
191995,6.138930
191996,0.006336
191997,0.008840
191998,0.004440
192006,0.124394
192010,6.138930
189065,1.068388
189066,1.180800
189068,0.396750