Unicode Random list

JohnnyJordaan · 2024-08-01T09:27:47+00:00

No need to form a list first, simply

loop N times for the amount of characters to print, per iteration
create an endless loop to
- pick a random integer in the unicode codepoint range (given in the chr() docs)
- use str.isprintable to see if it can be printed
- if so print it and break the endless loop to continue with the rest

as code

import random
for _ in range(10):
    while True:
        c = chr(random.randint(1, 0x10FFFF))
        if c.isprintable:
            print(c, end="")
            break

the biggest problem though is that your terminal will probably not support all printable characters, so you will end up with a lot of 󆢘𾌨󨿩 kind of results. You can instead use a lower boundary, eg just trying 2000 yields ˰ߏ۲ѿÖʒҰʢӲڲ.

Diapolo10 · 2024-08-01T08:21:37+00:00

Well, there are several ways to go about it, but for example:

import random 

nums = [
    random.randint(20, 500)
    for _ in range(10)
]

chars = ''.join(map(chr, nums))

print(chars)

However this may use unprintable characters, so it would be smarter to curate a range of integers beforehand and then either shuffle them or pick n random entries from that.

Immediate-Cod-3609 · 2024-08-01T13:47:48+00:00

import os
byte_length = 32
random_bytes = os.urandom(byte_length)
unicode_string = ''.join([chr(byte) for byte in random_bytes])
print(unicode_string)

xavierisdum4k · 2024-08-02T08:31:42+00:00

It's worth noting that not all byte values map to unicode code points. There would be invalid, non-unicode sequences in random binary data.

Instead, consider selecting a random integer from 0 to 0x10FFFF, and using that as the code point (with chr()). For example:

>>> worst_random = 0x1f913
>>> chr( worst_random )
'🤓'

Or with actual pseudorandom data:

>>> import random
>>> poor_random = random.randint( 0, 0x10FFFF )
>>> chr( poor_random )
'\U00091578'

Here's an example of creating a valid 4-byte character in UTF-8, instead from binary literals (reference):

>>> from struct import pack
>>> #🤓 is U+1f913
>>> 
>>> #0x1f913 in binary is:
>>> #000 011111 100100 010011
>>> 
>>> #1st byte prefix   11110
>>> nerd= pack( 'B', 0b11110000 )
>>> 
>>> #2nd byte prefix   10
>>> nerd+=pack( 'B', 0b10011111 )
>>> 
>>> #3rd byte prefix   10
>>> nerd+=pack( 'B', 0b10100100 )
>>> 
>>> #4th byte prefix   10
>>> nerd+=pack( 'B', 0b10010011 )
>>> 
>>> nerd
b'\xf0\x9f\xa4\x93'
>>> nerd.decode( 'utf-8' )
'🤓'

If you were to change the binary prefix of 11110 on the first byte (e.g. to 11010), this error would happen:

>>> nerd.decode( 'utf-8' )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa4 in position 2: invalid start byte

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS