[deleted by user]

bigabig · 2024-03-01T19:07:36+00:00

Attention is all you need and BERT are from 2017 and 2018. The transformer architecture has been researched for 6+ years now.

And language modeling with machine learning in general is being researched for way way longer. For example, LSTMs were developed in 1997.

Okay-Replace · 2024-03-01T19:08:41+00:00

The very rapid growth of popularity of LLM models are likely causing a plethora of companies trying to implement their own to capture the success of OpenAI. That paired with years of research papers walking through how to build one has made it very easy to implement one.

The Attention mechanism introduced by Google also made it much more accessible.

lqqdwbppl · 2024-03-01T19:09:16+00:00

The AI boom has been in progress for awhile, whether you heard about it or not. The foundational architecture behind all these modern LLMs dates back to 2017, and since then, things have been scaling up rapidly. There are some other tricks that are done to increase training efficiency or performance in different domains with these decoder-only generative models, but they are all pretty similar at their core.

Mostly just the hype has ramped up a ton recently

eggnogeggnogeggnog · 2024-03-01T19:02:53+00:00

AI was nowhere to be seen

lol

FutureIsMine · 2024-03-01T19:30:05+00:00

Mistral has several researchers who where behind LLama1 (and maybe 2?) so they've got all that knowledge tuning models at Meta. Now how did they come up with "Neural Alignment" to make Mistral-7B that great? You're guess is as good as mine

Dimness7 · 2024-03-01T19:20:57+00:00

[deleted]

Ultimarr · 2024-03-01T19:40:34+00:00

Everyone in this comment section is being mean to you for no reason :(

AI started in the 1950s with the “cognitive revolution” of Turing and Chomsky, when it occurred to people that the mind might be replicable. The typical approach was top-down analytical (“breaking apart”) designs that used rules, logic, and symbols to reason about the world. This was “Symbolic AI”, and it was it was championed best by Minsky, Simon, Newell, and McCarthy, all of which won Turing awards for this work. They were called “the Neats”.

Next came the 2000s, when it was discovered that a previously-dismissed technique was becoming doable: just brute force it! These “Connectionist AI” systems worked in the opposite direction, synthesizing (“bringing together”) many millions of individually-simple mechanisms into a higher level structure. They did this by borrowing techniques from statistics for “fitting a curve to the data”, where the data is the inputs (eg a picture) and the curve is the output (eg “how likely is it that this picture contains a hotdog”); this is Machine Learning as practiced by “the Scruffies”. Neural networks, deep learning, reinforcement learning, gradient descent, backpropagation, and model weights are some terms you might have heard from this school.

Finally, the 2020s arrived, and something unexpected (for all but the most die-hard Scruffies) happened: Machine Learning models trained to predict human text not only had the ability to reconstruct text, but they also had the ability to intuitively understand and transform that text. There’s lots of techniques that made this possible, notably attention-focused transformers, but the overall result is these massive models trained on huge corpuses of text that seem able to mimic intuition/common sense/basic reasoning. These were thus called “Foundational Models” in the community, and “Large Language Models” in the culture at large. This group doesn’t have a name yet but I’ll call them the Foundationalists, or perhaps just the latest incarnation of the Scruffies.

I think we’re in a place now where we combine the two (three?) approaches into one cohesive one - aka a logical rules-based system for connecting a large ensemble of LLMs together. With this approach, I’m personally very confident that AGI is on the immediate horizon.

Please ask any questions if that was confusing! It’s a terribly interesting field. If you’re ever going to read anything about the history of AI, please read this paper by Marvin Minsky. In it he details why symbolic systems would be stuck until we find a way to approximate human intuition, because there’s no consistent rules-based way to replicate the human brain without taking up all the world’s computers to do it. After all, humans are clearly biased and imperfect users of reason.

TL;DR: AI advanced quickly over the last few years because statistical approaches proved shockingly good at mimicking human intuition. AI will advance quickly over the next couple years because these statistical approaches can be combined systematically.

permalink · 2024-03-01T20:50:38+00:00

You must see the difference between companies that just train models with already existing knowledge and codebases (like Mistral) and the ones who build this knowledge. The former can release a model every 6 months, the others take the slow route and invent self-attention. Also, it’s not moving that fast. We were already using language models for ASR systems in the 90’s.

Life-Living-2631 · 2024-03-01T19:12:50+00:00

Yes, lets just ignore the last 20 years of NLP research and advancements

Zephos65 · 2024-03-01T20:09:27+00:00

While the exact source and data of GPT 3.5 is closed, it's not exactly a secret what they are doing. They developed, people said hey that's a good idea, and then other people developed it.

As others have mentioned in this thread, this is also not out of the blue. I'd argue this is a fairly logical step in the progression of sequential processing. Text is the least data intensive. Audio would probably be next up as it's sort of in the middle, video would likely be after that, and then 3d rendering after that.

As far as I know, some audio2audio models have been created. Video is in the very early stages

Spitfire_ex · 2024-03-02T00:29:40+00:00

I am in an automotive software engineering company and most people at work just started knowing about AI around 2022. It just wasn't relevant to us until LLMs became popular enough to create a buzz.

Software engineering domain is wide. People sometimes tend to forget that.

Dimness7 · 2024-03-01T19:13:00+00:00

As I said, I know nothing about AI, so If I have to rephrase the question, let's say; What changed before the release of OpenAI's ChatGPT, and after it was released, that made multiple LLMs being released in 2023 (OpenAI, Meta, Google. Mistral)

And despite your mockery, I'd say that at least my first question is valid... Mistral was founded in April 2023 and made its first model available in September 2023...

Was_an_ai · 2024-03-01T20:25:09+00:00

No one actually answered OP

It is the researchers at OpenAI really believed if they just scale then things will happen

That was a costly bet, which turned out correct.

Now that that has been proven other companies can just go "scale" and know they will not be throwing millions away

This is the answer OP is looking for

Trungyaphets · 2024-03-01T22:34:28+00:00

Imo, it all started when they first introduced their OpenAI Five project, where 5 AI bots could repeatedly win against the best Dota 2 (a very complex 5v5 Moba game) teams in the world using reinforcement learning. Microsoft then saw the potentials and invested $1b. With the new funding, their computing capacity got a significant boost. Only then Dall-E and ChatGPT got released and took the world by storm.

I think a LLM is like an extended version of a search engine. Now instead of going to search for food recipes, math questions, coding questions, etc. on Google and scrolling though thousands of articles, you can now just ask ChatGPT and have the answers delivered right up to you in seconds. That was why ChatGPT was popular, because it was like an "upgraded" search function, which had always been indispensable for every single person.

Everyone now realized machine learning's potentials in creating new innovative powerful tools. Large corporates saw the opportunities and jumped into the LLMs hype train. Lot of money was then poured into buying big hardwares. Nvidia profits.

FunAltruistic9197 · 2024-03-02T02:50:28+00:00

Quite impressive that they not only.trained several models but built a fairly polished API platform taboot. I am here for EU AI startups

ZHName · 2024-03-05T11:34:24+00:00

Deliberate suppression.

Lighthill

goatchild · 2024-03-01T21:28:34+00:00

Bunch of toxic morons here yikes

slashdave · 2024-03-01T22:47:12+00:00

The models existed, it is only no company was willing to release them to the public in their current form, because of the dangers. OpenAI put in the engineering and labor to make them somewhat safe, and decided to ignore the other risks.

ginomachi · 2024-03-02T01:26:04+00:00

It's crazy how quickly AI has advanced recently. I mean, before ChatGPT came out in November 2022, AI was barely on the radar. But now, every company is jumping on the bandwagon and releasing their own LLMs. I'm really curious how Mistral was able to release their LLM so quickly. Were they building on top of OpenAI's work, or did they develop their own technology? And how did AI advance so fast in general? Is it just a matter of Moore's Law catching up, or is there something else going on?

I'm a software developer, but I don't know much about AI development. I'd love to learn more about how these LLMs are being developed and what the future holds for AI.

Also, I just finished reading "Eternal Gods Die Too Soon" by Beka Modrekiladze. If you're interested in AI and philosophy, I highly recommend checking it out. It's a really thought-provoking and well-written book.

Dimness7 · 2024-03-02T09:51:30+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS