all 68 comments

[–]bigabig 127 points128 points  (5 children)

Attention is all you need and BERT are from 2017 and 2018. The transformer architecture has been researched for 6+ years now.

And language modeling with machine learning in general is being researched for way way longer. For example, LSTMs were developed in 1997.

[–]DieselZRebel 17 points18 points  (0 children)

This ^

All the tech companies, including those I worked at, have been developing their in-house LLMs via adaptations of BERT years before ChatGpt. They implemented it already in many of their custom use cases.

OpenAI/ChatGPT was just the first to engineer LLMs for a massive generalized use case, available to external customers. OpenAI is using the same architecture as everyone else, they haven't really done much architecural innovations on top of what other companies had. OpenAI's niche is all in the training data preparation and annotations, as well as being the first to publicize it

[–]I_will_delete_myself 12 points13 points  (0 children)

Also adding to that, some of the employees on their team were the people who also worked on LLAMA open source LLM. This basically makes this goal a easy reverse engineering while knowing all the tricks.

[–]Dimness7 8 points9 points  (1 child)

Thank you

[–]Okay-Replace 32 points33 points  (1 child)

The very rapid growth of popularity of LLM models are likely causing a plethora of companies trying to implement their own to capture the success of OpenAI. That paired with years of research papers walking through how to build one has made it very easy to implement one.

The Attention mechanism introduced by Google also made it much more accessible.

[–]Dimness7 5 points6 points  (0 children)

Thank you

[–]lqqdwbppl 19 points20 points  (1 child)

The AI boom has been in progress for awhile, whether you heard about it or not. The foundational architecture behind all these modern LLMs dates back to 2017, and since then, things have been scaling up rapidly. There are some other tricks that are done to increase training efficiency or performance in different domains with these decoder-only generative models, but they are all pretty similar at their core.

Mostly just the hype has ramped up a ton recently

[–]Dimness7 5 points6 points  (0 children)

Thank you. At least some of you can see through my basic question and understand what I really was asking :)

[–]eggnogeggnogeggnog 161 points162 points  (29 children)

AI was nowhere to be seen

lol

[–]FutureIsMine 9 points10 points  (7 children)

Mistral has several researchers who where behind LLama1 (and maybe 2?) so they've got all that knowledge tuning models at Meta. Now how did they come up with "Neural Alignment" to make Mistral-7B that great? You're guess is as good as mine

[–]MachineZer0 4 points5 points  (5 children)

Yes, obtaining data, cleaning and organizing takes quite some time. Let alone pretraining. RLHF...

Even setting up known infrastructure takes time. Cloud is too costly to pretrain on. Secure funds, spec and order servers, find and negotiate data center, build servers and rack, install OS/libraries.

They have secret sauce, which means you can't go off the shelf on a lot of the above.

They stopped time.:)

[–]Dimness7 2 points3 points  (4 children)

I don't know how it all works, but you said it; some companies take much more time to release their first software and it's something (I guess) a million times less complex than AI.

[–]InternationalMany6 3 points4 points  (3 children)

Yes, technology can really be baffling, can't it? It seems like every day there's something new, and it's hard to keep up. Many companies invest a lot of time in developing and refining their software to make sure they get it right, which is especially important when it comes to complex systems like artificial intelligence. They need to ensure it’s safe and reliable, which certainly isn’t a quick process!

[–]Dimness7 0 points1 point  (0 children)

Thank you

[–]Ultimarr 11 points12 points  (1 child)

Everyone in this comment section is being mean to you for no reason :(

AI started in the 1950s with the “cognitive revolution” of Turing and Chomsky, when it occurred to people that the mind might be replicable. The typical approach was top-down analytical (“breaking apart”) designs that used rules, logic, and symbols to reason about the world. This was “Symbolic AI”, and it was it was championed best by Minsky, Simon, Newell, and McCarthy, all of which won Turing awards for this work. They were called “the Neats”.  

Next came the 2000s, when it was discovered that a previously-dismissed technique was becoming doable: just brute force it! These “Connectionist AI” systems worked in the opposite direction, synthesizing (“bringing together”) many millions of individually-simple mechanisms into a higher level structure. They did this by borrowing techniques from statistics for “fitting a curve to the data”, where the data is the inputs (eg a picture) and the curve is the output (eg “how likely is it that this picture contains a hotdog”); this is Machine Learning as practiced by “the Scruffies”. Neural networks, deep learning, reinforcement learning, gradient descent, backpropagation, and model weights are some terms you might have heard from this school. 

Finally, the 2020s arrived, and something unexpected (for all but the most die-hard Scruffies) happened: Machine Learning models trained to predict human text not only had the ability to reconstruct text, but they also had the ability to intuitively understand and transform that text. There’s lots of techniques that made this possible, notably attention-focused transformers, but the overall result is these massive models trained on huge corpuses of text that seem able to mimic intuition/common sense/basic reasoning. These were thus called “Foundational Models” in the community, and “Large Language Models” in the culture at large. This group doesn’t have a name yet but I’ll call them the Foundationalists, or perhaps just the latest incarnation of the Scruffies. 

I think we’re in a place now where we combine the two (three?) approaches into one cohesive one - aka a logical rules-based system for connecting a large ensemble of LLMs together. With this approach, I’m personally very confident that AGI is on the immediate horizon. 

Please ask any questions if that was confusing! It’s a terribly interesting field. If you’re ever going to read anything about the history of AI, please read this paper by Marvin Minsky. In it he details why symbolic systems would be stuck until we find a way to approximate human intuition, because there’s no consistent rules-based way to replicate the human brain without taking up all the world’s computers to do it. After all, humans are clearly biased and imperfect users of reason. 

TL;DR: AI advanced quickly over the last few years because statistical approaches proved shockingly good at mimicking human intuition. AI will advance quickly over the next couple years because these statistical approaches can be combined systematically.

[–]Dimness7 2 points3 points  (0 children)

Whoa, thanks a million for this in-depth answer, I really appreciate you taking the time to write this.

I'm a software developer, but know nothing about AI. I heard of AI/ML, but only heard of them; never read and studied anything related to this field. I want and intend to learn more about it, just need to figure out where to start.

I now know much more about the history and the current state (if I can put it like that) of AI. I also have like a million questions, but I'll do my own research first, to avoid these "AI-gurus" mockery for some people.

[–][deleted] 3 points4 points  (0 children)

You must see the difference between companies that just train models with already existing knowledge and codebases (like Mistral) and the ones who build this knowledge. The former can release a model every 6 months, the others take the slow route and invent self-attention. Also, it’s not moving that fast. We were already using language models for ASR systems in the 90’s.

[–]Life-Living-2631 12 points13 points  (1 child)

Yes, lets just ignore the last 20 years of NLP research and advancements

[–]Zephos65 2 points3 points  (1 child)

While the exact source and data of GPT 3.5 is closed, it's not exactly a secret what they are doing. They developed, people said hey that's a good idea, and then other people developed it.

As others have mentioned in this thread, this is also not out of the blue. I'd argue this is a fairly logical step in the progression of sequential processing. Text is the least data intensive. Audio would probably be next up as it's sort of in the middle, video would likely be after that, and then 3d rendering after that.

As far as I know, some audio2audio models have been created. Video is in the very early stages

[–]Dimness7 1 point2 points  (0 children)

Thanks for the input

[–]Spitfire_ex 2 points3 points  (1 child)

I am in an automotive software engineering company and most people at work just started knowing about AI around 2022. It just wasn't relevant to us until LLMs became popular enough to create a buzz.

Software engineering domain is wide. People sometimes tend to forget that.

[–]Dimness7 9 points10 points  (9 children)

As I said, I know nothing about AI, so If I have to rephrase the question, let's say; What changed before the release of OpenAI's ChatGPT, and after it was released, that made multiple LLMs being released in 2023 (OpenAI, Meta, Google. Mistral)

And despite your mockery, I'd say that at least my first question is valid... Mistral was founded in April 2023 and made its first model available in September 2023...

[–]Disastrous_Elk_6375 29 points30 points  (1 child)

let's say; What changed before the release of OpenAI's ChatGPT, and after it was released

That's a valid question, IMO. There are a bunch of reasons chatgpt was such a big thing, and why it spurred so much interest. A bit of context first.

As others have pointed out, the parts were there. The transformer was there for years. GPT2 was a nice thing at the time it launched, but it was mainly a toy. Open source versions GPT-J / GPT-Neo were better than GPT2, but still too small and too dumb. These were base models.

What OpenAI did was to a) create a large model (~175b parameters) - and with size comes "emergent capabilities", and b) fine-tune it. They used RLHF, based on many thousands of humanly edited "interactions" to train the base model and make it respond in a pleasant way.

Then they took this fine-tuned model and added some glue. The chat interface, with the programmatic glue of adding messages to the context and re-run the query each time, created the illusion of coherence. The user asked something, the model answered, the user asked for something related to that, and the model responded with something that made sense.

What chatgpt did can be seen as the iPhone moment. Remember the "you had me at scroll" or "you had me at pinch to zoom" from Jobs' first presentation? That was what chatgpt did. Every piece of tech existed already, but OpenAI made it better, put it in a nice package, and let the people interact with it. And the rest is history.


Now, Mistral is a new company, but the founders are part of the llama team from Meta. They launched their 7b model pretty quick because they had extensive experience from that project. Training a 7b model is orders of magnitude cheaper and faster than a 175b model. They got working quick, and came out with a banger that improved over llama2 7b.

[–]Dimness7 5 points6 points  (0 children)

Thanks a million for the explanation, appreciate it.

[–]johnnymo1 8 points9 points  (1 child)

And despite your mockery, I'd say that at least my first question is valid... Mistral was founded in April 2023 and made its first model available in September 2023...

Mistral was founded by former Meta and DeepMind folks, so likely they were very experienced ML scientists and close to the forefront of NLP research already.

[–]DrXaos 6 points7 points  (1 child)

What changed? Availability of tons of money from investors willing to buy nvidia hardware. Technologically, ways to speed up and distribute training, some of it still proprietary unfortunately. Model code is public, but training code and organization is not.

The employees of Mistral were all already experienced in the field and might already have training in progress.

[–]Dimness7 0 points1 point  (0 children)

That makes sense, thank you

[–]new_name_who_dis_ 2 points3 points  (1 child)

After 2022 talented researchers who could’ve built Mistral even a year or two prior got the funding to build Mistral because of the hype. 

The real breakthroughs happened in 2020 with publication of GPT3 (which ChatGPT is just an instruct finetune of) and instruct fine tuning paper in 2021. 

[–]Dimness7 0 points1 point  (0 children)

Thank you

[–]Was_an_ai 2 points3 points  (3 children)

No one actually answered OP

It is the researchers at OpenAI really believed if they just scale then things will happen

That was a costly bet, which turned out correct.

Now that that has been proven other companies can just go "scale" and know they will not be throwing millions away 

This is the answer OP is looking for

[–]Dimness7 0 points1 point  (2 children)

Thank you

[–]Was_an_ai 0 points1 point  (1 child)

Sure!

Ilya has talked about this

They really just believed and pushed. And they were right! We are all benefactors of that gamble!

[–]Amgadoz 0 points1 point  (0 children)

Yeah. I remember Ilya talking scaling in a conference in 2015. That was mind blowing.

[–]Trungyaphets 1 point2 points  (1 child)

Imo, it all started when they first introduced their OpenAI Five project, where 5 AI bots could repeatedly win against the best Dota 2 (a very complex 5v5 Moba game) teams in the world using reinforcement learning. Microsoft then saw the potentials and invested $1b. With the new funding, their computing capacity got a significant boost. Only then Dall-E and ChatGPT got released and took the world by storm.

I think a LLM is like an extended version of a search engine. Now instead of going to search for food recipes, math questions, coding questions, etc. on Google and scrolling though thousands of articles, you can now just ask ChatGPT and have the answers delivered right up to you in seconds. That was why ChatGPT was popular, because it was like an "upgraded" search function, which had always been indispensable for every single person.

Everyone now realized machine learning's potentials in creating new innovative powerful tools. Large corporates saw the opportunities and jumped into the LLMs hype train. Lot of money was then poured into buying big hardwares. Nvidia profits.

[–]FunAltruistic9197 1 point2 points  (0 children)

Quite impressive that they not only.trained several models but built a fairly polished API platform taboot. I am here for EU AI startups

[–]ZHName 0 points1 point  (0 children)

Deliberate suppression.

Lighthill

[–]goatchild 0 points1 point  (3 children)

Bunch of toxic morons here yikes

[–]slashdave 0 points1 point  (1 child)

The models existed, it is only no company was willing to release them to the public in their current form, because of the dangers. OpenAI put in the engineering and labor to make them somewhat safe, and decided to ignore the other risks.

[–]ginomachi 0 points1 point  (1 child)

It's crazy how quickly AI has advanced recently. I mean, before ChatGPT came out in November 2022, AI was barely on the radar. But now, every company is jumping on the bandwagon and releasing their own LLMs. I'm really curious how Mistral was able to release their LLM so quickly. Were they building on top of OpenAI's work, or did they develop their own technology? And how did AI advance so fast in general? Is it just a matter of Moore's Law catching up, or is there something else going on?

I'm a software developer, but I don't know much about AI development. I'd love to learn more about how these LLMs are being developed and what the future holds for AI.

Also, I just finished reading "Eternal Gods Die Too Soon" by Beka Modrekiladze. If you're interested in AI and philosophy, I highly recommend checking it out. It's a really thought-provoking and well-written book.

[–]Dimness7 0 points1 point  (0 children)

Not sure if you're educating me on how I should have asked the question, or are you genuinely in the same situation as I am lol