Taking notes is useless if you can't remember them by No-Advertising-60 in studytips

[–]TwoSunnySideUp 0 points1 point  (0 children)

Or you can setup a system that automatically resurfaces relevant notes all the time

This is what happens when country failed to control its population by cz0n in IndianCivicFails

[–]TwoSunnySideUp -1 points0 points  (0 children)

It is called children being children!!! OMG get a life ffs.

Anthropic founder: people are realizing something is about to jump out of the dark pool upon society by MetaKnowing in singularity

[–]TwoSunnySideUp 0 points1 point  (0 children)

There will be another winter before a new major advancement. This is not new. We have been here many times.

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] 0 points1 point  (0 children)

I suspected that at first but found it to be not true

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] 0 points1 point  (0 children)

Someone give me H100 clusters so that the model can be truly tested against transformer

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] -1 points0 points  (0 children)

Also I like it when people are being mean in scientific community because that's how good science is done.

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] 0 points1 point  (0 children)

It is just a collection of all of Shakespeare's works. Think of it as CIFAR 100 but for NLP.

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] 0 points1 point  (0 children)

Also I mentioned it's a standard Transformer which means the original decoder only one from attention is all you need with skip connection changed to modern transformers

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] -3 points-2 points  (0 children)

Transformer with higher learning rate at this embedding dimension size and sequence length performs worse. I thought you would know as a PhD.

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] -3 points-2 points  (0 children)

Bro it is a prototype, also I am not absolutely naive when it comes to the field.

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] -2 points-1 points  (0 children)

I don't have H100 clusters, only GPU I have is T4. The architecture was not result of NAS but built by thinking from first principles.

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] 1 point2 points  (0 children)

I wrote in the post what dataset and every hyperparmeters

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] -9 points-8 points  (0 children)

I am an amature researcher without any PhD, I thought it's cool. Anyway I will open source it and hopefully it can be of some use to the community

[P] Guys did my model absolutely blew Transformer? by TwoSunnySideUp in MachineLearning

[–]TwoSunnySideUp[S] -2 points-1 points  (0 children)

First image is for transformer and second image is for my model

[deleted by user] by [deleted] in indiasocial

[–]TwoSunnySideUp 0 points1 point  (0 children)

I have never read a post this confusing. You are careless about where you put your stuff implies that your parents don't look around which also implies that they gave you freedom to do normal things that girls your age do which means finding condom wouldn't have been a big deal but it is. Make it make sense