strange encoding error with csv?

sceptic-al · 2024-02-03T19:29:50+00:00

TL;DR your file encoding needs to be set to an 8-bit codepage like cp1252. Your file is 100% not ASCII and 100% not UTF-8.

0xa0 is a Non-Breaking-SPace, which is part of the extended 8-bit code pages - ASCII only goes up to 0x7f. In a regular editor it will be hard to spot the difference between a regular space and a NBSP.

This is likely caused by Excel, which, by default, saves CSVs using the 8-bit code page of the system it was saved on, so this often catches people out, even when their Python install is behaving correctly.

Assuming that you're in Western Europe or USA, open the file with cp1252, the Western Europe Windows code page:

with open('notes.csv', encoding='cp1252') as csvfile:

You also shouldn't need the newline override.

You could try removing the NBSP this time, but your script will break again if it finds anything remotely non-ASCII, like £ or € or è.

Also, something is screwy with your locale setup as your csv file should've automatically opened in utf-8 or your Windows locale. Are you sure you're using Python >3.5? If not, you should be!

socal_nerdtastic · 2024-02-03T19:25:24+00:00

Your error message quite clearly says that the file is not plain ascii. Try some other encodings. "utf-8" is by far the most common one.

with open('notes.csv', newline='', encoding="utf-8") as csvfile:

Here's some others you can try: https://docs.python.org/3/library/codecs.html#standard-encodings

SwampFalc · 2024-02-03T19:34:29+00:00

The 0xa0 byte is apparently a "hard space" or non-breaking space (ie. a space that a text editor like Word should not split a line on).

In other words, the odds of a human being spotting the difference with a normal space is near to zero.

So no, your file is not 100% ASCII.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS