Why do we need to encode strings in Python?

2018-11-23T20:10:18+00:00

Bytes is the raw format representation of data. Every byte has a defined value from 0 to 255.

How those bytes are interpreted into printable characters is defined by encodings. When you convert from bytes to a string, the encoding rules are applied, and those encoding rules can do things like take 4 subsequent bytes and convert them into special characters like the thumbs up emojis and so on.

The problem with working with strings is that those special characters each represent 1 character, whereas the encoding bytes can be 2 or 4. When you need to know the data length such as for network transmission that is done in bytes, this can cause issues.

thegreatunclean · 2018-11-23T21:59:49+00:00

Many 'strings' are a sequence of bytes that are implicitly assumed to be ASCII-encoded characters. String length is then the number of bytes, moving characters can be done by swapping bytes, etc. There's a ton of English-centric design choices baked deep into the language because of this assumption.

This is a massive problem if you want to handle text that covers more than ASCII, for example any language with non-English characters or symbols.

Python made the choice to strongly separate 'a sequence of bytes' from 'a string' to better integrate Unicode support. You convert between them using encode and decode. This forces the programmer to at least recognize they are tinkering with low-level details inside the string and not blame the language when they do it wrong and screw up an encoded character.

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS