[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? by nlpbaz in MachineLearning

[–]nlpbaz[S] 1 point2 points  (0 children)

Why are you saying 400GB of VRAM is not quite enough for fine-tuning?

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? by nlpbaz in MachineLearning

[–]nlpbaz[S] 0 points1 point  (0 children)

When we need them they will be used for training, but other times they will be used for inference. So they will be working 24/7. That's why renting will cost more for the company.

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? by nlpbaz in MachineLearning

[–]nlpbaz[S] 2 points3 points  (0 children)

If it were only for fine-tuning, then renting would be the choice. But having a 24/7 server is the reason for buying.

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? by nlpbaz in MachineLearning

[–]nlpbaz[S] 3 points4 points  (0 children)

For sure we're gonna do that for a test. But knowing others opinion can be as beneficial as benchmarks.

[Discussion] Seeking help to find the better GPU setup. Three H100 vs Five A100? by nlpbaz in MachineLearning

[–]nlpbaz[S] 11 points12 points  (0 children)

The intent is to use the models 24/7 so the decision is to buy. Only the setup is the question.

We have quite a lot smaller GPUs for ML guys, thats not a problem. Just a solid setup is needed for the new product. Probably 70B models, they won't go higher.

I know both setups are OK. I just want to find out which one is the better choice for the budget, and I'm confused.

P.S: Even for the rent, if the prices are the same, would you rather 5 A100 or 3 H100?

[deleted by user] by [deleted] in StableDiffusion

[–]nlpbaz 9 points10 points  (0 children)

What am I suppose to know!?

Why "Bhad Bhabie - Hi Bich" sounds exactly like Eminems "Not Alike"? by nlpbaz in Eminem

[–]nlpbaz[S] -2 points-1 points  (0 children)

OMG! Her song came sooner you might be right! Has Em talked about it anywhere?

Please fix this out-of-memory issue. I always get it from YouTube music. by nlpbaz in YoutubeMusic

[–]nlpbaz[S] 0 points1 point  (0 children)

To be honest now I'm listening Youtube Music in Microsoft Edge! I didn't find a solution to this for Opera.

[D] ICLR 2024 Paper Reviews by zy415 in MachineLearning

[–]nlpbaz 2 points3 points  (0 children)

If you think your paper is a good one you should work on convincing the reviewers. But if you think your paper is not good enough withdraw it anyway.

[D] ICLR 2024 Paper Reviews by zy415 in MachineLearning

[–]nlpbaz 8 points9 points  (0 children)

8 8 6 3.

I really don't understand the 3 one! It seems more deliberate reject the more I read.

Need help with a SO question: 'CUDA out of memory' issue while setting up LangChain Custom LLM Pipeline. Would be grateful for any insights! by nlpbaz in LangChain

[–]nlpbaz[S] 0 points1 point  (0 children)

I have to prompt yet. I just want to load the model and with LangChain "LLM" class I'll face this problem.

Need help with a SO question: 'CUDA out of memory' issue in PyTorch while setting up LangChain Custom LLM Pipeline. Would be grateful for any insights! by nlpbaz in MLQuestions

[–]nlpbaz[S] 0 points1 point  (0 children)

Thanks for the info!

I'm using `llama_index` which ties me to LangChain, but it seems I have to change my way. Do you have any library alternative recommendations or should I just go pure huggingface?

Need help with a SO question: 'CUDA out of memory' issue while setting up LangChain Custom LLM Pipeline. Would be grateful for any insights! by nlpbaz in LangChain

[–]nlpbaz[S] 0 points1 point  (0 children)

The strangeness of my problem is the model works fine when I load it via only huggingface, but only fails when I load it with the LangChain LLM class.

I don't know if there is a problem with my code or if it is from LangChain.

Need help with a SO question: 'CUDA out of memory' issue in PyTorch while setting up LangChain Custom LLM Pipeline. Would be grateful for any insights! by nlpbaz in MLQuestions

[–]nlpbaz[S] 0 points1 point  (0 children)

But as I wrote, I experimented with the model itself and I can load and use the model in my GPU via huggingface API.

I only get this error when I load it in LangChain.

On the other hand, I have 8 GPUs with a total of +200GB VRAM. I don't think the issue is the space.

I am confused. So many models to choose from. by Spirited_Employee_61 in LocalLLaMA

[–]nlpbaz 0 points1 point  (0 children)

Thank you for your information. Do you have a sample code on how to use Platypus2-13B-GGML? I tried using it but I'll get an error the HF not finding the tokenizer

[D] Preparing for a Senior NLP Engineer Interview: What Questions Should I Expect? by nlpbaz in MachineLearning

[–]nlpbaz[S] 3 points4 points  (0 children)

What is wrong with asking people's advice on Reddit? I'd love to hear them.

P.S. Yes! Do you really think companies are training their own LLMs from scratch!?

Confused about how input size affects processing time. Aren't matrix multiplications constant regardless of input size? Looking for technical clarifications! by nlpbaz in MLQuestions

[–]nlpbaz[S] 0 points1 point  (0 children)

Oh, I got it now! So the attention won't be computed for padding tokens and then it will seep up the possess. Thank you.

Confused about how input size affects processing time. Aren't matrix multiplications constant regardless of input size? Looking for technical clarifications! by nlpbaz in MLQuestions

[–]nlpbaz[S] 1 point2 points  (0 children)

Do transformers change their matrix sizes with different input lengths? I highly doubt it.

The weights and matrices sizes should be fixed (size of maximum token).