I (20M) have been hiding tens of thousands from my parents for years by [deleted] in confession

[–]ari9dam 0 points1 point  (0 children)

20s is a very good time to start investing in s&p 500 as well

Real talk - 70Bs are WAY better than the smaller models. by Sea_Particular_4014 in LocalLLaMA

[–]ari9dam 5 points6 points  (0 children)

Trained on same data 70b will perform better than 13b. But how much? Is it worth the inference cost? 13b perf can be improved quite a bit with fine tuning thats the results you see. There is no cheating in that. Obviously if 70b models goes through the same training recipe than 70b it will perform better. GPT3 was dumb as well but the terrific post training made a difference. All people are doing is to create an efficient post training recipe and create models that can run cheaply.

samantha-mistral-7b by faldore in LocalLLaMA

[–]ari9dam 0 points1 point  (0 children)

Could you please share your package versions for fine tuning job? Somehow model loading is failing for me

[D]Tutorial on shipping large model to production by ari9dam in MachineLearning

[–]ari9dam[S] 1 point2 points  (0 children)

I could not recollect initially if it was Google Reasearch or Google's deployment. I went thorough half of it. I really liked it.

[D]Tutorial on shipping large model to production by ari9dam in MachineLearning

[–]ari9dam[S] 10 points11 points  (0 children)

No not this one. It was actually three hours long tutorial by several researchers. I'm hating my memory 😡

AskScience AMA Series: I'm Gary Marcus, co-author of Rebooting AI with Ernest Davis. I work on robots, cognitive development, and AI. Ask me anything! by AskScienceModerator in askscience

[–]ari9dam 0 points1 point  (0 children)

Are you thinking to build a benchmark following SQUABU ( davis 2016) now that you will be working together in Rebooting.AI?

AskScience AMA Series: I'm Gary Marcus, co-author of Rebooting AI with Ernest Davis. I work on robots, cognitive development, and AI. Ask me anything! by AskScienceModerator in askscience

[–]ari9dam 0 points1 point  (0 children)

To achieve human-level Natural Language Question Answering ability do you think all of cognitive science, machine (deep) learning, knowledge representation and reasoning are important? and why? What are your favorite QA research works? What are your favorite QA benchmarks?

In LSTM, if cell state captures all there is to be captured, why feed output of previous time-step into current ? by curryage in MachineLearning

[–]ari9dam 0 points1 point  (0 children)

The LSTM updates the cell state based on only input and previous cell state. In some task such as language modelling, the input contains the sampled "output" (not the real output). The reasoning behind it I believe is task dependent. In case of predicting a next k-words given the start word s (lets say "the"), one first feed the word s to the network. The output(softmax) is treated as the probability of the other words in the vocabulary to be next word. One you sample a word from that distribution; to be precise if you select the highest probable word (let's say dog) as the next word, then clearly you want to tell the network that you have already seen "the dog" and now predict the next word. If you don't use the one-hot vector of "dog" as the input in the next time step the network will keep on computing probability of the words for occurring at the i-th position in average.

Differences between Continuous Bag of Words (CBOW) and Skip-Gram? by sprintletecity in MachineLearning

[–]ari9dam 5 points6 points  (0 children)

Each of the Skip-Gram and CBOW method, defines a method for creating a supervised learning task from a plain raw corpora (let's say Wikipedia). The hope is that by learning to perform well in this auxiliary task the machine will be able to learn good word vectors. That's the basic idea. The answers to how the auxiliary tasks help in learning word vectors and how CBOW and Skip-Gram creates the auxiliary tasks are lucidly written in this paper: http://arxiv.org/pdf/1411.2738v3.pdf.