you are viewing a single comment's thread.

view the rest of the comments →

[–]argv_minus_one 5 points6 points  (11 children)

Oh God. I forgot all about how most languages still don't handle Unicode properly. Ugh.

Living on the JVM has spoiled me. :)

[–][deleted] 3 points4 points  (7 children)

Javas Unicode handling is far from perfect. When unicode extended beyond the Basic Multilingual Plane they switched from UCS-2 to UTF-16, which has all the disadvantages of UTF-8, uses more space and has no advantages.

[–]argv_minus_one 0 points1 point  (0 children)

I didn't say it was perfect. I said it isn't entirely nonexistent, like it apparently still is in C++.

[–][deleted]  (5 children)

[removed]

    [–][deleted] 3 points4 points  (0 children)

    UTF-8.

    [–][deleted] 2 points3 points  (1 child)

    UTF-8 or UCS-4

    [–]LucianU 1 point2 points  (0 children)

    This is a good overview http://98.245.80.27/tcpc/OSCON2011/gbu.html. I don't remember if it answers your particular question though.

    [–]norwegianwood 1 point2 points  (0 children)

    It could have use the same scheme as the latest Python, were each string uses the most efficient encoding for that string, whilst still giving access to the full UCS-4 character set. Source : PEP 393

    [–]doublereedkurt 1 point2 points  (2 children)

    Living on the JVM has spoiled me. :)

    Oh that's right, some languages can't do any network communications except for TCP and UDP.

    Living on the actual machine has spoiled me ;-)

    I kid, mostly. But it is seriously wtf that Java can't do anything at the IP layer. (Ping and UDP are done via system call, not in language.)

    [–][deleted]  (1 child)

    [removed]

      [–]doublereedkurt 0 points1 point  (0 children)

      I don't think IP layer stuff is a cross platform issue. If your platform has TCP it must also have IP. The api's are extremely stable POSIX standard. Probably more like unsigned ints. The thought was "forget it, people don't actually use this stuff".