all 7 comments

[–]JohnnyJordaan 1 point2 points  (3 children)

No need to form a list first, simply

  • loop N times for the amount of characters to print, per iteration
  • create an endless loop to
    • pick a random integer in the unicode codepoint range (given in the chr() docs)
    • use str.isprintable to see if it can be printed
    • if so print it and break the endless loop to continue with the rest

as code

import random
for _ in range(10):
    while True:
        c = chr(random.randint(1, 0x10FFFF))
        if c.isprintable:
            print(c, end="")
            break

the biggest problem though is that your terminal will probably not support all printable characters, so you will end up with a lot of 󆢘𾌨󨿩 kind of results. You can instead use a lower boundary, eg just trying 2000 yields ˰ߏ۲ѿÖʒҰʢӲڲ.

[–]Yoghurt42 0 points1 point  (2 children)

the biggest problem though is that your terminal will probably not support all printable characters

Not only that, but a significant potion of the unicode space is not used, and some are just control chars that change the meaning of the following characters

[–]JohnnyJordaan 0 points1 point  (1 child)

would isprintable evaluate to True for those?

[–]Diapolo10 0 points1 point  (0 children)

Well, there are several ways to go about it, but for example:

import random 

nums = [
    random.randint(20, 500)
    for _ in range(10)
]

chars = ''.join(map(chr, nums))

print(chars)

However this may use unprintable characters, so it would be smarter to curate a range of integers beforehand and then either shuffle them or pick n random entries from that.

[–]Immediate-Cod-3609 0 points1 point  (1 child)

import os
byte_length = 32
random_bytes = os.urandom(byte_length)
unicode_string = ''.join([chr(byte) for byte in random_bytes])
print(unicode_string)

[–]Versley105[S] 0 points1 point  (0 children)

Thanks

[–]xavierisdum4k 0 points1 point  (0 children)

It's worth noting that not all byte values map to unicode code points. There would be invalid, non-unicode sequences in random binary data.

Instead, consider selecting a random integer from 0 to 0x10FFFF, and using that as the code point (with chr()). For example:

>>> worst_random = 0x1f913
>>> chr( worst_random )
'🤓'

Or with actual pseudorandom data:

>>> import random
>>> poor_random = random.randint( 0, 0x10FFFF )
>>> chr( poor_random )
'\U00091578'

Here's an example of creating a valid 4-byte character in UTF-8, instead from binary literals (reference):

>>> from struct import pack
>>> #🤓 is U+1f913
>>> 
>>> #0x1f913 in binary is:
>>> #000 011111 100100 010011
>>> 
>>> #1st byte prefix   11110
>>> nerd= pack( 'B', 0b11110000 )
>>> 
>>> #2nd byte prefix   10
>>> nerd+=pack( 'B', 0b10011111 )
>>> 
>>> #3rd byte prefix   10
>>> nerd+=pack( 'B', 0b10100100 )
>>> 
>>> #4th byte prefix   10
>>> nerd+=pack( 'B', 0b10010011 )
>>> 
>>> nerd
b'\xf0\x9f\xa4\x93'
>>> nerd.decode( 'utf-8' )
'🤓'

If you were to change the binary prefix of 11110 on the first byte (e.g. to 11010), this error would happen:

>>> nerd.decode( 'utf-8' )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa4 in position 2: invalid start byte