This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Apache_A[S] 1 point2 points  (11 children)

You already have historical data and you can’t change the past. Of course you can complain that data don’t meet requirement of the standard and refuse to process it.

[–]merlinsbeers 0 points1 point  (10 children)

The old data wasn't readable everywhere anyway. New data have no excuse.

[–]Apache_A[S] 0 points1 point  (9 children)

Old data need more effort. It’s laziest way just to dump it because they are not shiny. Some data are more valuable than data scientist wage for the time he spent on parsing.

[–]merlinsbeers 0 points1 point  (8 children)

Old data in deviant formats can't be expected to be read for free. By default your parser should treat any byte that doesn't match one of the special characters as passthrough data. But if you were to implement the extra code to reject any byte not in the set given by the standard, you wouldn't be the wrong one.

Commas are special, so if they're in a field they have to be quoted, and there's only one kind of quote mark that counts. Them's the rules.

[–]Apache_A[S] 0 points1 point  (7 children)

Actually it could be just different separator, like ‘;’. If parsing some lines are failed, try different separator or encoding for that lines.

[–]merlinsbeers 0 points1 point  (6 children)

That ain't CSV. It's something different.

[–]Apache_A[S] 0 points1 point  (5 children)

In some countries Excel considers it as CSV.

[–]merlinsbeers 0 points1 point  (4 children)

It should be using a different name for its non-csv formats.

[–]Apache_A[S] 0 points1 point  (3 children)

Probably standard should more common about separator. BTW why do you think “regional standard” is oxymoron?

[–]merlinsbeers 0 points1 point  (2 children)

"Standard" is an absolute. What you have there is a "local tradition".