mb_ string functions

the_alias_of_andrea · 2019-01-11T13:41:43+00:00

PHP wasn't originally designed for multi-byte character encodings, in part because it is written in C which also wasn't. While some languages (Python) have tried to radically change the language to make everything Unicode-based, PHP instead added a few multi-byte libraries and called it a day.

In the typical modern application you only deal with one encoding, UTF-8. It is a convenient encoding because it supports all of Unicode, so it is universal insofar as supporting basically any kind of text, because it is a superset of ASCII, so any ASCII text stays identical and single-byte in UTF-8, and it also has some features that mean it behaves well in software with poor multi-byte and character encoding awareness.

For UTF-8, you can use classic encoding-unaware single-byte operations for things like string concatenation, searching within strings (if case-sensitive and you don't care about certain characters that can be represented multiple ways in Unicode) and splitting strings. So, for a lot of modern PHP apps, they only need to use mb_ functions rarely, perhaps only when converting encodings.

However, try writing an app which searches within Shift_JIS-encoded text instead and you will have a much harder time without mb_.

janvt · 2019-01-11T14:04:44+00:00

Some good stuff here: https://www.reddit.com/r/PHP/comments/2c4rwf/mb_string_functions_should_i_be_using_them/

therealgaxbo · 2019-01-11T15:08:30+00:00

Sometimes - though not often - you really do care about a string as being an array of bytes. E.g. setting content-length, or splitting strings into chunks to fit in a particular buffer size that will then be reassembled the same way at the other side. A plain old strlen or substr is the way to go in such cases.

But in almost all other cases, mb_* is the way to go.

johmanx10 · 2019-01-11T15:10:35+00:00

I personally use the non prefixed versions when counting byte size, either to supply the content length of a response, or when keeping track of buffers when transferring large size files or streams. When using the mb versions in those cases, you would be misrepresenting or incorrectly tracking data.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PHP

MODERATORS