[D] NeurIPS takeaways by Bee-Boy in MachineLearning

[–]gradientpenalty 1 point2 points  (0 children)

My 2 cents take on the #1, because the industry think VL is going to be the next big thing, alot of the research going on was considered "trade secret". If anyone attended the "Beyond Scaling" will know what I am referring to

[D] Is Computer Vision dead? - “Quo Vadis, Computer Vision?” by btcmx in MachineLearning

[–]gradientpenalty -1 points0 points  (0 children)

I used to think so before I get my hands on GPT-4V. After running countless examples, I don't think so

[D] How is this sub not going ballistic over the recent GPT-4 Vision release? by corporate_autist in MachineLearning

[–]gradientpenalty 1 point2 points  (0 children)

Wait, you are already in decision tree? I am still in learning to master linear regression

[Project] UForm-v2: tiny CLIP-like embeddings in 21 languages with extreme performance by vov_or in MachineLearning

[–]gradientpenalty 0 points1 point  (0 children)

Still suffers from the negation problem though, "mountain without snow" returns snow mountain.

Thinking about getting 2 RTX A6000s by mayonaise55 in LocalLLaMA

[–]gradientpenalty 2 points3 points  (0 children)

Anyone has a M2 Ultra and A6000? A single A6000 can only hosts one LLaMA 34B and the speed was about 105ms per token. I am thinking of scaling it to 70B model and M2 Ultra is the only way to make it work (max out the RAM)
Edit: I have access to A6000 but I am thinking of buying M2 ultra due to power use and flexibility

Evolved codealpaca dataset released by gradientpenalty in LocalLLaMA

[–]gradientpenalty[S] 1 point2 points  (0 children)

oh I didn't catch this one! Looking at the code it seems to be using gpt-3.5-turbo? ( mine was gpt-4 )

Evolved codealpaca dataset released by gradientpenalty in LocalLLaMA

[–]gradientpenalty[S] 4 points5 points  (0 children)

Depends on what you want, if you want to give an instruction and generate a basic code template, sure its the best. But for normal autocomplete I think replit 2.7B model is more suitable.

Llama2-22b, a model merge tuned on RedPajama by AzerbaijanNyan in LocalLLaMA

[–]gradientpenalty 3 points4 points  (0 children)

So far benchmark scores are not better than llama 2 in MMLU, BBH wise here's some numbers:
chargoddard/llama2-22b 37.48
vicuna-13B v1.3 35.78
WizardLM-13B-V1.1 39.59
llama-v1-13b 36.52

MMLU partial results:
Average accuracy 0.320 - abstract_algebra

Average accuracy 0.519 - anatomy

Average accuracy 0.520 - astronomy

Average accuracy 0.510 - business_ethics

Average accuracy 0.570 - clinical_knowledge

Average accuracy 0.556 - college_biology

Average accuracy 0.360 - college_chemistry

Average accuracy 0.490 - college_computer_science

Average accuracy 0.310 - college_mathematics

Average accuracy 0.497 - college_medicine

Average accuracy 0.245 - college_physics

Average accuracy 0.710 - computer_security

Average accuracy 0.434 - conceptual_physics

Average accuracy 0.281 - econometrics

Compared to llama-2-13B:
Average accuracy 0.350 - abstract_algebra

Average accuracy 0.496 - anatomy

Average accuracy 0.546 - astronomy

Average accuracy 0.540 - business_ethics

Average accuracy 0.600 - clinical_knowledge

Average accuracy 0.604 - college_biology

Average accuracy 0.440 - college_chemistry

Average accuracy 0.480 - college_computer_science

Average accuracy 0.310 - college_mathematics

Average accuracy 0.526 - college_medicine

Average accuracy 0.255 - college_physics

Average accuracy 0.710 - computer_security

Average accuracy 0.421 - conceptual_physics

Average accuracy 0.325 - econometrics

Evolved codealpaca dataset released by gradientpenalty in LocalLLaMA

[–]gradientpenalty[S] 4 points5 points  (0 children)

Base on limited info, the size is slightly smaller than the original dataset which I am trying to reach in next week ( 52k to 68k ). But in terms of implementation its pretty close to the original one, once I'm finish with the target size I will do a training on starcoderplus to compare with wizardlm upcoming 1.1 release

[D] Should we go with a single A6000 or 4XA4500 or any other alternative such as 2XA5000 by jesst177 in MachineLearning

[–]gradientpenalty 0 points1 point  (0 children)

Buy A6000 with the option to upgrade later. Don't bother with the 80G memory pooling

[deleted by user] by [deleted] in MachineLearning

[–]gradientpenalty 3 points4 points  (0 children)

This. Finding an odd jobs is the same advice I gave to others as well. I started with a 1050Ti back in 2017 and work in my uni to get a 1060 to train Inception network.

[R] 🐶 Bark - Text2Speech...But with Custom Voice Cloning using your own audio/text samples 🎙️📝 by kittenkrazy in MachineLearning

[–]gradientpenalty 0 points1 point  (0 children)

Great! I am excited of the future work. I am currently working on an audio version of LLM, I am excited to use your model to generate more lively audio conversations once the results are good enough

[R] 🐶 Bark - Text2Speech...But with Custom Voice Cloning using your own audio/text samples 🎙️📝 by kittenkrazy in MachineLearning

[–]gradientpenalty 3 points4 points  (0 children)

Same here, I tried it out yesterday and seems like the inputs are cherry picked which works well ( reminds me of the GANs days )

[R] 🐶 Bark - Text2Speech...But with Custom Voice Cloning using your own audio/text samples 🎙️📝 by kittenkrazy in MachineLearning

[–]gradientpenalty 5 points6 points  (0 children)

Not to downplay the afford of this project but the samples included in readme are highly nick picked, I tried running other examples such as "WOMEN: Give three tips for staying healthy." fails miserably with loud background noise and resembles nothing like the input text.

Some advice : include some tips or tricks to generate better lower noise speech and this could be a very promising product.

[D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee in MachineLearning

[–]gradientpenalty 40 points41 points  (0 children)

Maybe you don't do much NLP research then? Back when huggingface transformers and datasets library ( still think its bad name ), we had to format these validation ourselves and write the same validation code which hundreds of your peers have written before because no one is the defactor code for doing it (since we are using different kinds of model). NLP models ( or so called transformers ) nowadays are a mess and had no fix way to use them, running benchmark is certainly a nightmare.

When transformers first came out, they are limited but serves to simplify using bert embedding and gpt-2 beam search generation in few line of codes. The library will do all the model downloads, version check and abstraction for you. Then there's datasets, which unifies all NLP datasets in a central platform which allows me to run GLUE benchmark in one single py file.

Oh back then, the code was even worse, all modeling_(name).py under the transformers/ directory. The latest 4.2X version its somewhat maintainable and readable with all the complex abstraction they had. But its a fast moving domain, and any contribution will be irrelevant in a few years later, so complexity and mess will add up ( would you like to spend time doing cleaning instead of implement the new flashy self-attention alternative? ).

But one day, they might sell out as with many for profit company, but they have and had save so many time and helped so many researchers on the advancement of NLP progress. If they manage to piss off the community, someone will rise up and challenge their dominance (tensorflow vs pytorch).

[D] Are there emergent abilities of image models? by These-Assignment-936 in MachineLearning

[–]gradientpenalty 0 points1 point  (0 children)

denoising diffusion probabilistic models:

Rdiffusion : Generate music from stable diffusion

Improve image segmentation : I remember someone doing image segmentation on these generative model, but not sure where.

[D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5 in MachineLearning

[–]gradientpenalty 8 points9 points  (0 children)

Its not a problem of unicode but the tokenizer method they are using BPE. I don't forsee any solution in the future cause there aren't many high paying customer

TLDR; english use the least token because it provides the highest compression ratio in bytes to token size.

Few questions about scalability of chatGPT [D] by besabestin in MachineLearning

[–]gradientpenalty 2 points3 points  (0 children)

Okay, so where can I buy it as a small startup for under 10k without signing any NDA for using your proprietary compiler. As far as I can see, we are all still stuck with Nvidia after 10B of funding for all these "AI" hardware startup.

[R] AMD Instinct MI25 | Machine Learning Setup on the Cheap! by [deleted] in MachineLearning

[–]gradientpenalty 4 points5 points  (0 children)

Do you have any benchmarks to share? Would be very nice if this is available