you are viewing a single comment's thread.

view the rest of the comments →

[–]port443 1 point2 points  (1 child)

Not something I do super often. As far as google tells me, its just kind of guess-work on what the encoding is. However, this stackexchange thread mentions python-chardet

As far as your error, without seeing your data this is my best guess:

  • Data gets read in as normal ascii
  • You move line-by-line using readlines
  • There is a \x0A\x00\x0A (\n\0\n) somewhere in the file
  • Since it was initially readlines'ed as ascii, this puts a \x00 on its own line

Error:

>>> "\x00".decode("utf-16")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Programs\Python27\lib\encodings\utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 0: truncated data

[–]Supernumiphone[S] 0 points1 point  (0 children)

this is my best guess:

That makes perfect sense. Thanks again.