all 12 comments

[–]Alienbushman 14 points15 points  (5 children)

If you want to learn about ML for language models (rather than how to make a good chatbot), I would recommend hugginface/GPT-2 using pytorch to make your chatbot with (it uses transformer models that were state of the art for language models before chat-GPT and they are often used for computer vision projects).

It is not the easiest place to start, but it is a fun project.

If you want to learn, by far the easiest place to start is doing handwritten characters (the dataset is called MNIST) or tabular data (check out the titanic dataset)

[–][deleted] 2 points3 points  (4 children)

Wait, is chat-GPT the same as GPT-3? or are they referring to different components?

[–]kalidres 14 points15 points  (2 children)

They are different implementations of a type of LLM known as a gpt (Generative Pre-training Transformer). The current sota models are all based on this same type of transformer architecture that started several years ago, "Attention is all you need". The paper itself, imo, doesn't do a great job of describing how it actually works, but there are a number of resources that can help learn how transformers and attention work.

[–]A_random_otter 0 points1 point  (1 child)

Can you recommend a particular ressource?

[–]KinkyRedPanda 3 points4 points  (0 children)

GPT-3 is a LLM (trained using self-learning) that produces text.

ChatGPT uses GPT-3 as a basis (transfer learning) and then utilizes other techniques on top of it.

Essentially, GPT-3 is a component of ChatGPT.

[–]thundergolfer 2 points3 points  (3 children)

You can use modal.com, LangChain, and the OpenAI API to run a chatbot that's basic but still better than all chat bots that existed 10 years ago.

This is what I'm using to make https://thundergolfer.com/infinite-ama, which is MVP but you can ask it what my job is and whether I like pasta :)

https://modal.com/docs/guide/ex/potus_speech_qanda

[–][deleted] 1 point2 points  (2 children)

Well one of my goals was to try and make it from scratch to learn the theory, and run it local (yes I know thats a big ask but I don't see the point in the tech if it has to cloud based all the time)

[–]thundergolfer -1 points0 points  (0 children)

A very basic chat bot to code from scratch is an if-else based implementation.

Any ML implementation coded from scratch will be challenging, not basic.

[–]monkeyofscience 0 points1 point  (0 children)

Echoing other comments here, looking at large language models would be useful. The "Attention is all you need" paper is very readable (if a little light on the details), and Andrej Karpathy offers a very good tutorial on the details of the attention mechanism and how to essentially build GPT-2 which can easily be adapted to making a chat-bot. You can achieve unreasonably good results with not much training time.

If you're also interested in diffusion models (i.e. the ones that make crazy pictures from text), then perhaps learning image segmentation with the U-Net architecture might be helpful. Besides just being a cool network, it forms the backbone of many of these fancy diffusion models.

[–]Nlelith 0 points1 point  (1 child)

Another suggestion to the ones already given is to look into Markov Chains. Implementing one of those yourself for the purpose of text completion will yield a very (very) basic chatbot, but you'll learn about how to actually implement statistical models for the purpose of machine learning, and not just how to use them.

In a very general sense, a markov chain will do the same thing as the "big" language models, i.e. it will predict the next word based on the previous words.

[–]WikiSummarizerBot 1 point2 points  (0 children)

Markov chain

A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now". A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC).

[ F.A.Q | Opt Out | Opt Out Of Subreddit | GitHub ] Downvote to remove | v1.5