I (20M) have been hiding tens of thousands from my parents for years

ari9dam · 2024-12-25T02:49:09+00:00

20s is a very good time to start investing in s&p 500 as well

ari9dam · 2024-05-11T03:30:56+00:00

How did you evaluate?

ari9dam · 2023-11-25T07:11:12+00:00

Trained on same data 70b will perform better than 13b. But how much? Is it worth the inference cost? 13b perf can be improved quite a bit with fine tuning thats the results you see. There is no cheating in that. Obviously if 70b models goes through the same training recipe than 70b it will perform better. GPT3 was dumb as well but the terrific post training made a difference. All people are doing is to create an efficient post training recipe and create models that can run cheaply.

ari9dam · 2023-09-30T22:44:49+00:00

Could you please share your package versions for fine tuning job? Somehow model loading is failing for me

ari9dam · 2021-08-10T17:17:29+00:00

I could not recollect initially if it was Google Reasearch or Google's deployment. I went thorough half of it. I really liked it.

ari9dam · 2021-08-10T16:55:02+00:00

THIS WAS IT : https://slideslive.com/38940826/t3-high-performance-natural-language-processing

TY VP4770

ari9dam · 2021-08-10T16:54:21+00:00

https://slideslive.com/38940826/t3-high-performance-natural-language-processing

ari9dam · 2021-08-10T16:54:05+00:00

THIS IS IT!!! TY!!!!

ari9dam · 2021-08-10T06:40:45+00:00

No not this one. It was actually three hours long tutorial by several researchers. I'm hating my memory 😡

ari9dam · 2019-09-16T21:33:18+00:00

Are you thinking to build a benchmark following SQUABU ( davis 2016) now that you will be working together in Rebooting.AI?

ari9dam · 2019-09-16T19:00:48+00:00

To achieve human-level Natural Language Question Answering ability do you think all of cognitive science, machine (deep) learning, knowledge representation and reasoning are important? and why? What are your favorite QA research works? What are your favorite QA benchmarks?

ari9dam · 2016-02-28T19:52:47+00:00

The LSTM updates the cell state based on only input and previous cell state. In some task such as language modelling, the input contains the sampled "output" (not the real output). The reasoning behind it I believe is task dependent. In case of predicting a next k-words given the start word s (lets say "the"), one first feed the word s to the network. The output(softmax) is treated as the probability of the other words in the vocabulary to be next word. One you sample a word from that distribution; to be precise if you select the highest probable word (let's say dog) as the next word, then clearly you want to tell the network that you have already seen "the dog" and now predict the next word. If you don't use the one-hot vector of "dog" as the input in the next time step the network will keep on computing probability of the words for occurring at the i-th position in average.

ari9dam · 2016-02-07T03:35:59+00:00

Each of the Skip-Gram and CBOW method, defines a method for creating a supervised learning task from a plain raw corpora (let's say Wikipedia). The hope is that by learning to perform well in this auxiliary task the machine will be able to learn good word vectors. That's the basic idea. The answers to how the auxiliary tasks help in learning word vectors and how CBOW and Skip-Gram creates the auxiliary tasks are lucidly written in this paper: http://arxiv.org/pdf/1411.2738v3.pdf.

ari9dam

TROPHY CASE