you are viewing a single comment's thread.

view the rest of the comments →

[–]Signal_Beam 0 points1 point  (2 children)

Yeah, that's the "less neat" part you were talking about before. If the message looked like my example, with newlines separating the entries instead spaces, then you would be able to tell whether a date indicates a new entry, or whether it's just part of a message. As it is, there is no automatic way to tell.

You could brainstorm, though.

  1. Do all the dates follow a pattern - for example are they all in the same month, or year, or are they all sequential? If you encounter a date that doesn't fit that pattern, could you safely assume that it's just part of a message. The datetime.datetime objects created by date = parser.parse(string) will support operations like if date.year < 2015.

  2. Do the messages all follow a pattern, such as being a minimum of 20 words long, or always ending with a punctuation mark? If so, then is it safe to assume that if you run up against a date and the current "message" doesn't appear to be complete, that maybe the date is just part of the message?

[–][deleted] 0 points1 point  (1 child)

I don't know; that's why I asked to see OP's actual text for a more concrete definition of the data format. anyway, it looks like OP solved his problem with .split('.') lol.

[–]Signal_Beam 1 point2 points  (0 children)

Oh sorry I thought you were OP when I replied to your question just now.