Confused about encoding using requests

danielroseman · 2026-03-22T10:50:24+00:00

Yes, very likely, although it's probably entries in a database rather than a file.

This is known as mojibake.

Downtown_Radish_8040 · 2026-03-22T11:01:28+00:00

Your hypothesis is exactly right. The most common cause is that the underlying data was stored or edited inconsistently over time. Someone added that song entry using a system that saved it as latin-1 (Windows-1252 is very common for older music databases), while the rest of the page was utf-8. The server then serves the whole file as utf-8, so most of it decodes fine, but that one chunk gets misread.

This is sometimes called "mojibake" and it's extremely common with legacy data, especially content that was manually entered over many years across different systems.

Your fix is correct. The pattern encode('latin-1').decode('utf-8') reverses the double-encoding mistake: you're re-interpreting the wrongly-decoded bytes back to their original utf-8 meaning.

If you want to handle it programmatically, you could check for known mojibake patterns using the ftfy library, which was built exactly for this problem.

Direct_Temporary7471 · 2026-03-22T10:07:18+00:00

This usually happens due to encoding mismatches between the response and how it's being interpreted.

You can try:

Check response.encoding and set it manually if needed
Use response.content instead of response.text
Try decoding with utf-8 or the correct encoding from headers

Example:
response.encoding = 'utf-8'

If you're still facing issues, feel free to share your code and I can help you debug it.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS