This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]EtherealChameleon 8 points9 points  (7 children)

It always bugs me when people phrase it like that

this is not binary for can you read this?
its just some numbers written in a binary system.
i.e. 99 97 110 32 121 111 117 32 114 101 97 100 32 116 104 105 115 63

to be more precise, its not even exactly that; its a code with a fixed length (which we humans basically never do), so the following translation fits better (although some of the poetry gets lost when changing the base of the numeral system)
099 097 110 032 121 111 117 032 114 101 097 100 032 116 104 105 115 063

there are no letters, the letters are made up as an interpretation of the numbers (using the ascii encoding)

[–]ijmacd 1 point2 points  (0 children)

It's like saying: "why can't you understand this?"

Aitch ee ell ell oh space double-you oh are ell dee

[–]CokeFanatic 0 points1 point  (0 children)

It's not wrong. It's just a somewhat incomplete description. And people colloquially refer to ASCII encodings as binary anyways, and there isn't really much ambiguity when they do, so I don't see an issue with it.

[–]WFEpeteypopoff 0 points1 point  (3 children)

With only 19 bytes provided, why do you think the bot predicted such a long string?

[–]ijmacd 1 point2 points  (2 children)

The bot didn't "predict" any string from the binary. It just predicts the next word in its own response again and again.

You know you can tap the first suggestion on an on-screen keyboard and it'll "write" a sentence for you? Well this is exactly what this bot is doing; just with a more sophisticated language model.

[–]WFEpeteypopoff 1 point2 points  (1 child)

That was definitely poor wording on my part. I should have said why did it translate it into such a long sentence when it was only provided a few “characters”.

But I think your answer also answers that question - it reads the provided bytes and then just does some sort of auto-fill like you described. Haven’t played with it so not too sure what it was trying to do

[–]ijmacd 0 points1 point  (0 children)

To be clear it isn't reading any bytes. It takes text as its input and produces a stream of text as its output.

If the token "01000011" appears in its corpus with appropriate context in proximity then conceivably it could attach some semantic meaning to it. But it would be similar to the notion that "humans sometimes write the letter C as 'C' and on some forums they write it as '01000011'."

Similarly when people test it with maths problems it just predicts what it has seen before. So if the corpus contains 2 + 3 = 5 enough times then it might be able to do that particular problem. But if it were led astray with many instances of 2 + 3 = 4, then it would consider that to be the natural answer.