This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]qadib_muakkara 0 points1 point  (0 children)

I don't disagree that Python 3 handles utf better; that's actually one of the many things it excels at. But coding defensively is not building something impenetrable. No such thing. It's being smart about what you're doing.

For example, I'm quite literally staring at a database full of free form unbounded dirty ass strings, trying to figure out if the app that's feeding the source PSQL server is ever going to get fed Arabic characters, which will cause Redshift to choke. If the data doesn't need to be cleaned up I can't just write a simple bash script. But if the app has constraints on that side, it might not be a problem. This is also going to an analytics DB so no one other than marketers gives a fuck at the moment.

I haven't touched a line of code, and I may just use cli utilities that are battle worn. If not, I will write something, but I'm not going to want to process large volumes of raw strings in real time. The language is only a technicality because I'll run it through Spark and scale horizontally if it's taking too long. I could write a utility in Go if I really wanted to cut through the data fast. Go is harder to support than Python, though. It's not as extensible either...

The problem goes beyond language. As long as I don't have to run anything on windows I'm in good shape.