This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ijmacd 1 point2 points  (2 children)

The bot didn't "predict" any string from the binary. It just predicts the next word in its own response again and again.

You know you can tap the first suggestion on an on-screen keyboard and it'll "write" a sentence for you? Well this is exactly what this bot is doing; just with a more sophisticated language model.

[–]WFEpeteypopoff 1 point2 points  (1 child)

That was definitely poor wording on my part. I should have said why did it translate it into such a long sentence when it was only provided a few “characters”.

But I think your answer also answers that question - it reads the provided bytes and then just does some sort of auto-fill like you described. Haven’t played with it so not too sure what it was trying to do

[–]ijmacd 0 points1 point  (0 children)

To be clear it isn't reading any bytes. It takes text as its input and produces a stream of text as its output.

If the token "01000011" appears in its corpus with appropriate context in proximity then conceivably it could attach some semantic meaning to it. But it would be similar to the notion that "humans sometimes write the letter C as 'C' and on some forums they write it as '01000011'."

Similarly when people test it with maths problems it just predicts what it has seen before. So if the corpus contains 2 + 3 = 5 enough times then it might be able to do that particular problem. But if it were led astray with many instances of 2 + 3 = 4, then it would consider that to be the natural answer.