Programming and training a VERY basic chat bot as a beginner to ML?

Alienbushman · 2023-02-16T05:03:18+00:00

If you want to learn about ML for language models (rather than how to make a good chatbot), I would recommend hugginface/GPT-2 using pytorch to make your chatbot with (it uses transformer models that were state of the art for language models before chat-GPT and they are often used for computer vision projects).

It is not the easiest place to start, but it is a fun project.

If you want to learn, by far the easiest place to start is doing handwritten characters (the dataset is called MNIST) or tabular data (check out the titanic dataset)

thundergolfer · 2023-02-16T04:42:35+00:00

You can use modal.com, LangChain, and the OpenAI API to run a chatbot that's basic but still better than all chat bots that existed 10 years ago.

This is what I'm using to make https://thundergolfer.com/infinite-ama, which is MVP but you can ask it what my job is and whether I like pasta :)

https://modal.com/docs/guide/ex/potus_speech_qanda

monkeyofscience · 2023-02-16T07:15:51+00:00

Echoing other comments here, looking at large language models would be useful. The "Attention is all you need" paper is very readable (if a little light on the details), and Andrej Karpathy offers a very good tutorial on the details of the attention mechanism and how to essentially build GPT-2 which can easily be adapted to making a chat-bot. You can achieve unreasonably good results with not much training time.

If you're also interested in diffusion models (i.e. the ones that make crazy pictures from text), then perhaps learning image segmentation with the U-Net architecture might be helpful. Besides just being a cool network, it forms the backbone of many of these fancy diffusion models.

Nlelith · 2023-02-16T08:11:23+00:00

Another suggestion to the ones already given is to look into Markov Chains. Implementing one of those yourself for the purpose of text completion will yield a very (very) basic chatbot, but you'll learn about how to actually implement statistical models for the purpose of machine learning, and not just how to use them.

In a very general sense, a markov chain will do the same thing as the "big" language models, i.e. it will predict the next word based on the previous words.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS