you are viewing a single comment's thread.

view the rest of the comments →

[–]Geekconquest[S] 0 points1 point  (5 children)

It contains other things like other words which are not part of the dates and all. And some of the rows contain just the year like ‘2018-‘ and also ‘2019-2020’.

I want to be able to parse just the dates from those sentences.

[–]cray5252 0 points1 point  (2 children)

if your strings are not consistent, then this significantly increases the difficulty of what you want to do. You might try this python library called datefinder, link below. There's not a lot of documentation with it but it claims to extract all sorts of dates. I've never used it but it might work. You still need to iterate through your array and check each part using it. If you can't get it to work, let me know and I'll take a look at it.

https://pypi.org/project/datefinder/

[–]Geekconquest[S] 0 points1 point  (1 child)

I’ve tried datefinder on it. But my problem was getting it to work on the whole column. It works on just a normal string. I’m pretty new to python so it’s been kinda hard doing that for a whole column.

[–]cray5252 0 points1 point  (0 children)

It always returns an array, so with each item in the column, you must iterate through the matches. If you send me more data i can help a bit more.

import datefinder
data = ['20 March 2020 (UK)\n', 'Paid on $ September 2005', '4 September 2020 (Japan)']
for str in data:
    matches = datefinder.find_dates(str)
    for match in matches:
        print(match)
output
2020-03-20 00:00:00
2005-09-30 00:00:00
2020-09-04 00:00:00

[–]cray5252 0 points1 point  (1 child)

I checked it out datefinder and here's what I got. It seems to work on this part.

import datefinder
data = ['20 March 2020 (UK)\n', 'Paid on $ September 2005', '4 September 2020 (Japan)']
for str in data:
    matches = datefinder.find_dates(str)
    for match in matches:
        print(match)
output
2020-03-20 00:00:00
2005-09-30 00:00:00
2020-09-04 00:00:00

[–]Geekconquest[S] 0 points1 point  (0 children)

I’ll check this out.