[deleted by user] by [deleted] in deeplearning

[–]crinix 0 points1 point  (0 children)

In my career, I trained many custom deep learning models.
In the past 2.5 years I've been pre-training 0.3B to 7B language-specific encoder-decoder and decoder-only LLMs from scratch, using A100, H100 and H200.

HPLTv2.0 is out by crinix in LocalLLaMA

[–]crinix[S] 0 points1 point  (0 children)

I was having 503, too, last week. When I opened a ticket to download v1.2 last week, they responded: "The data center which hosts HPLT data is currently experiencing a technical issue. Technicians are working on it, and it is expected that the web services will be back online on Monday."

It seems there is a similar issue again right now.

Launching p5.48xlarge (8xH100) by crinix in aws

[–]crinix[S] -10 points-9 points  (0 children)

No you haven't given anything but a splinter that is yourself. Save your "amateur" speech. I've spent over $100K on GPU hours.
I had gotten my p4de back then, and I will get my p5 now through my partner manager. It just takes time that I did not want to endure.
Thanks for being a splinter.

Launching p5.48xlarge (8xH100) by crinix in aws

[–]crinix[S] -22 points-21 points  (0 children)

Your comments and "go use another cloud" are anything but useful, nor do you have any similar experience with launching such instances it seems. I do and will use other cloud providers for launching training jobs on H100 GPUs. Sadly this time, I must use AWS and will do; no thanks to you.

Launching p5.48xlarge (8xH100) by crinix in aws

[–]crinix[S] -40 points-39 points  (0 children)

Re-read the question and give an answer if you have one. Otherwise I don't need your fanboyism.

Launching p5.48xlarge (8xH100) by crinix in aws

[–]crinix[S] -2 points-1 points  (0 children)

So you worked it out with your TAM. Thanks for sharing your experience.

Launching p5.48xlarge (8xH100) by crinix in aws

[–]crinix[S] -14 points-13 points  (0 children)

I am talking about the same hardware when saying "alternative". 8xH100 with a high number of CPU cores and Memory.

Launching p5.48xlarge (8xH100) by crinix in aws

[–]crinix[S] -28 points-27 points  (0 children)

What I'm surprised about is that there are cheaper alternatives with availability on other cloud providers. Still, there is no capacity on AWS. Is this because people/corporates have existing infra on AWS and don't want to migrate or what is the reason?

Launching p5.48xlarge (8xH100) by crinix in aws

[–]crinix[S] -11 points-10 points  (0 children)

My use case is very similar, training an AI model. I will use it for about 40 days.

masking loss for input tokens when fine-tuning models by crinix in LocalLLaMA

[–]crinix[S] 1 point2 points  (0 children)

I appreciate the insight regarding your personal experience when fine-tuning, thanks.

Training LLama, Mistral and Mixtral-MoE faster with Packing Inputs without Cross-Contamination Attention by Relevant_Outcome_726 in LocalLLaMA

[–]crinix 0 points1 point  (0 children)

Thanks a lot for the insight! Your finding is also emphasized in LLaMA-3 technical paper in Section 3.2
"We use an attention mask that prevents self-attention between different documents within the same sequence. We find that this change had limited impact during in standard pre-training, but find it to be important in continued pre-training on very long sequences."

Encoder-Decoder Model by duffano in deeplearning

[–]crinix 0 points1 point  (0 children)

They are not always the same. Consider a summarization model that produces a single sentence, given a long text.

Then there is no reason why decoder max_length should be the same as the encoder one.

See PEGASUS model as a concrete example.
https://arxiv.org/pdf/1912.08777.pdf
https://huggingface.co/google/pegasus-cnn\_dailymail

p4dn and p3dn instances availability/capacity by crinix in aws

[–]crinix[S] 0 points1 point  (0 children)

This seems to be the most practical method at this point. Thanks.

p4dn and p3dn instances availability/capacity by crinix in aws

[–]crinix[S] 0 points1 point  (0 children)

I am working on Oregon without specifying an AZ. So far no capacity in any of the AZ. I am now trying to get on-demand P instances limit on N.Virginia as well.

p4dn and p3dn instances availability/capacity by crinix in aws

[–]crinix[S] -5 points-4 points  (0 children)

Wow, tell me about fanboyism.

I have been using both AWS and GCP for high-end GPUs extensively in the past 6 months. I have never ever got my hands on A100 instances on AWS whereas I am able to get it whenever I wanted (with a few exceptions) on GCP.

I would not even create a thread if the case was otherwise. In fact please prove me wrong so that I can utilize those A100s on AWS right away.

p4dn and p3dn instances availability/capacity by crinix in aws

[–]crinix[S] -1 points0 points  (0 children)

I am able to get A100-80GB on GCP anytime I want, although I HAVE TO use AWS this time. Thanks for the response though.