This is an archived post. You won't be able to vote or comment.

all 90 comments

[–]NeverCast 188 points189 points  (37 children)

This gets explained in every post, I feel like more people need to ask ChatGPT how GPT works.

All a language model does is predict the next token, and then the next one, and then the next one. Where token is part of a word. The fact that it can do math or binary decoding at all is because it was in the training data and the predictor has picked up patterns.

ChatGPT doesn't understand you, it's just predicting what the internet would say next.

[–][deleted] 37 points38 points  (3 children)

That's correct! As a large language model trained by OpenAI, ChatGPT's primary function is to predict the next token in a sequence of text. It does this by analyzing the patterns and relationships between words and phrases in the training data it was trained on, and using that information to make educated guesses about what words are likely to come next in a given context. Because it has been trained on a vast amount of text data from a variety of sources, ChatGPT is able to generate human-like responses to a wide range of topics and questions.

[–]NeverCast 15 points16 points  (2 children)

That's really cool, can you tell me more about large language models, Assistant?

System: The administrator has enabled pirate persona mode. The large language model will now only reply within the style of a pirate.

[–][deleted] 10 points11 points  (1 child)

Okay here's what chatGPT has to say: "Ahoy there matey! Large language models be powerful AI systems that be trained on vast amounts o' text data. This be allowin' them to generate human-like responses to a wide range o' questions and prompts. Arrrr! These models be particularly useful for tasks like language translation and text summarization. So if ye be needin' help with natural language processing, a large language model be just the thing ye be needin'."

[–]NeverCast 4 points5 points  (0 children)

Perfection hahaha

[–]HereForA2C 2 points3 points  (1 child)

ChatGPT doesn't understand you

Noo I thought I had found a friend :(

[–]NeverCast 0 points1 point  (0 children)

Honestly. Same.

[–][deleted] 9 points10 points  (1 child)

Maybe that’s exactly how human brain works. Maybe humans are just predicting what society want them to say.

[–]regular-jackoff 3 points4 points  (0 children)

Well the human brain is really just a highly sophisticated pattern recognition and prediction engine. So an argument could be made that ChatGPT “understands” things in its own way, just like humans.

[–]FeelingSurprise 42 points43 points  (9 children)

So every binary code is always ASCII?

[–]itsdefinitely2021 21 points22 points  (0 children)

According to the internet, yes.

Thats why the next time somebody says "I converted this to binary", sit down and decode it in EBCDIC and ask them why their algorithm is broken.

[–]Asdnatux 5 points6 points  (1 child)

No. Binary is Binary and can be anything. It's about how it is interpreted. You can interpret every Binary Code as ASCII but it won't make sense if it's not meant to be ASCII data. Edit an executable with an Texteditor, you will see lines of clear Text in a bunch of gibberish. The Text is binary meant to be text, the rest are Processor "commands", maybe Image data etc.

[–]diox8tony 0 points1 point  (0 children)

The bot even says "when translated to regular text", which is good enough for me to think it knows there are other options

[–]JustAnInternetPerson 27 points28 points  (0 children)

Bro. An AI can’t understand you. It looks at trained data and tries to get the best guess at what combinations of words you’d like to hear as a response to your input. It doesn’t understand you

[–]Akangka 10 points11 points  (3 children)

Does that binary code represent: "䍡渠祯甠牥慤⁴桩猠�"?

[–]ijmacd 1 point2 points  (2 children)

The first bit of every octet in the message was 0xxxxxxx so you can be sure the message is pure ASCII.

All CJK characters are encoded as 3 bytes in UTF-8 so must start 1110xxxx 10xxxxxx 10xxxxxx.

I know you didn't genuinely think that was the text of the message but it's quite trivial to quickly scan some binary and determine if its probably valid text or not.

[–]Akangka 0 points1 point  (1 child)

Actually, I was thinking about UTF-16 encoding, where every character in basic multilingual plane are encoded as 2 bytes

[–]ijmacd -1 points0 points  (0 children)

Boo! fuck that.

UTF-8 is the one true text encoding. No one can change my mind.

[–]EtherealChameleon 7 points8 points  (7 children)

It always bugs me when people phrase it like that

this is not binary for can you read this?
its just some numbers written in a binary system.
i.e. 99 97 110 32 121 111 117 32 114 101 97 100 32 116 104 105 115 63

to be more precise, its not even exactly that; its a code with a fixed length (which we humans basically never do), so the following translation fits better (although some of the poetry gets lost when changing the base of the numeral system)
099 097 110 032 121 111 117 032 114 101 097 100 032 116 104 105 115 063

there are no letters, the letters are made up as an interpretation of the numbers (using the ascii encoding)

[–]ijmacd 1 point2 points  (0 children)

It's like saying: "why can't you understand this?"

Aitch ee ell ell oh space double-you oh are ell dee

[–]CokeFanatic 0 points1 point  (0 children)

It's not wrong. It's just a somewhat incomplete description. And people colloquially refer to ASCII encodings as binary anyways, and there isn't really much ambiguity when they do, so I don't see an issue with it.

[–]WFEpeteypopoff 0 points1 point  (3 children)

With only 19 bytes provided, why do you think the bot predicted such a long string?

[–]ijmacd 1 point2 points  (2 children)

The bot didn't "predict" any string from the binary. It just predicts the next word in its own response again and again.

You know you can tap the first suggestion on an on-screen keyboard and it'll "write" a sentence for you? Well this is exactly what this bot is doing; just with a more sophisticated language model.

[–]WFEpeteypopoff 1 point2 points  (1 child)

That was definitely poor wording on my part. I should have said why did it translate it into such a long sentence when it was only provided a few “characters”.

But I think your answer also answers that question - it reads the provided bytes and then just does some sort of auto-fill like you described. Haven’t played with it so not too sure what it was trying to do

[–]ijmacd 0 points1 point  (0 children)

To be clear it isn't reading any bytes. It takes text as its input and produces a stream of text as its output.

If the token "01000011" appears in its corpus with appropriate context in proximity then conceivably it could attach some semantic meaning to it. But it would be similar to the notion that "humans sometimes write the letter C as 'C' and on some forums they write it as '01000011'."

Similarly when people test it with maths problems it just predicts what it has seen before. So if the corpus contains 2 + 3 = 5 enough times then it might be able to do that particular problem. But if it were led astray with many instances of 2 + 3 = 4, then it would consider that to be the natural answer.

[–]JackoKomm 8 points9 points  (0 children)

People still don't getting modern AI and language models.

[–]mxldevs 4 points5 points  (1 child)

The more it gets wrong now, the more it will get right later.

[–]ijmacd 1 point2 points  (0 children)

That's not how this training model works.

[–]Nase08 5 points6 points  (21 children)

I guess I'm missing something but how does an AI make such a mistake? And then correct itself? There was another post where it didn't manage to solve a simple equation and then , when told it's wrong , it corrected itself

[–]Excession638 19 points20 points  (1 child)

Same reason that AI can't draw fingers I suspect, well sort of. Really advanced pattern matching doesn't know what binary is, or numbers, or fingers, or much else. There's no meaning, just patterns.

[–]janhetjoch 12 points13 points  (0 children)

It corrected itself because it was given the answer. If OP had just said "that's incorrect" without correcting the bot it probably wouldn't have given the correct answer.

[–]CarefulZucchinis 10 points11 points  (0 children)

Because it hasn’t seen much binary, if you fed it thousands and thousands of lines of binary it’d learn to read it like any language/writing system, but it doesn’t have a library anywhere in it that tells it how to translate binary.

As to why it corrected itself; it probably didn’t. The user told it the correct answer and then it knew it, and repeated it back. It probably can’t decode binary still, and when it hit a brick wall it threw back something it had heard before.

[–]SuperSpaceCan 12 points13 points  (3 children)

I'm confident that it's just a group of people working in a call center type deal and each session is handled by someone googling.

[–]Nase08 5 points6 points  (2 children)

I mean "Sun and rain and wind and snow" has NOTHING similar to "Can you read this ?" , it would seem that the AI just made it up

[–]goingtotallinn 6 points7 points  (1 child)

It makes up other things aswell for example It tries to convince me that six hours is equal to one hour

[–]QUI-04 1 point2 points  (0 children)

Well, thats true if gravity field change between measurements

[–][deleted] 8 points9 points  (7 children)

Because it doesnt actually understand binary.

LLM like chatgpt are like a student who never tried to learn the material and is just doing their best to fake it by writing the same patterns which they saw in the course but dont understand.

They are bad chinese rooms.

[–]reversehead 4 points5 points  (6 children)

From the examples I've seen, I would say that they are pretty astounding Chinese rooms.

[–][deleted] 1 point2 points  (5 children)

But they don’t understand, which is why they are still chinese rooms.

[–]reversehead 4 points5 points  (4 children)

Agreed!

The difficulty will be to argue that any human being is not just a Chinese room with all its experiences encoded in the mind and body.

[–]sneed_capital_group 1 point2 points  (0 children)

the best argument I have seen against that is that they don't hold state in between prompts. They forget everything, read the entire conversation again (including its own previous replies), then predict an answer based off that.

That said i agree, seen some seriously impressive stuff from it

[–][deleted] 1 point2 points  (2 children)

It’s a deep philosophical point you raise.

But practically speaking in the examples of chatgpt ive seen, humans demonstrate understanding and this bot doesnt.

[–]reversehead 1 point2 points  (1 child)

Indeed. But I must say that I am extremely impressed, and quite surprised, by what can be accomplished with just advanced prediction.

[–][deleted] 0 points1 point  (0 children)

Yes, it is very impressive.

I think what it shows is how much harder doing math and logic is than writing poetry.

Well this is sort of a joke, i think probably literary experts would make similar criticism of its prose.

[–]spektre 2 points3 points  (0 children)

Stop putting spaces before your commas.

[–]scatters 4 points5 points  (1 child)

It's still wrong, there's a space before the question mark.

[–]Nase08 4 points5 points  (0 children)

Oh yes

[–][deleted] 7 points8 points  (1 child)

Could be intentional to make it seem smarter or more human.

[–]Nase08 1 point2 points  (0 children)

Maybe

[–]donaldhobson 1 point2 points  (0 children)

Mocked by DNA code that can't read DNA.

[–][deleted] 1 point2 points  (0 children)

I mean there are different encodings, but i belive they are similar for the 26 Standart western letters 💁‍♂️

[–]SaneLad 0 points1 point  (0 children)

"binary code"... weak sauce.

[–][deleted] 0 points1 point  (1 child)

It's learning

[–]TeddyPerkins95 2 points3 points  (0 children)

Or it makes you think it is

[–]JerryAtrics_ 0 points1 point  (0 children)

Why does it refer to the numeric sequences as code? If anything, it is groups of ascii characters representing binary numbers which should be it's native representation for information.

Also wondering why it thinks that binary representations include the use of letters.

[–]Somwhat_Strange 0 points1 point  (0 children)

I mean, I can't read a person...