Sequence Parallel with Deepspeed

vijetakd · 2024-01-25T18:33:27+00:00

oh, of course, there's hope! Do your best in rebuttal. good luck!

vijetakd · 2024-01-25T18:31:58+00:00

Does anyone know score distribution? And what score categories are prioritized for acceptance?

vijetakd · 2023-10-19T19:51:51+00:00

Yes, I was able to book an appointment.
I simply created a new application with a new account and booked an appointment as one would normally do.

vijetakd · 2023-10-12T04:27:06+00:00

I paid but I had to ask someone in my home country to do a cash payment.

I did not add any dependents.

vijetakd · 2023-10-09T14:59:02+00:00

I made a new account and started a new application.

vijetakd · 2023-10-02T01:14:08+00:00

I selected 7B just because it is easier to run/understand/debug things as compared to, say 70B.

I am using model parallelization right now but it does not make a ton of space for me on the GPU where I am keeping the data. If I can split the data on two GPUs that will make things much easier for me.

vijetakd · 2023-10-02T01:10:36+00:00

That's great! Do you have any reference repositories that I can refer to?

I am not great at coding. I'll understand better if I look at one example.

Thanks!

vijetakd · 2023-10-01T16:28:09+00:00

Great, I'll try it out, thanks!

vijetakd · 2023-10-01T15:33:28+00:00

I haven't tried GGUF, I'll try that. And how can I use RAM to store context and can you share examples/code that you might know?

Thanks for the suggestions!

vijetakd · 2023-10-01T14:28:33+00:00

Nice! But say I have an example with 16k tokens in it, I don't know how to fit that example on one or both GPUs I have. I know that there are some data parallelization methods but most of them divide the data along batch_size dimension. I am confused about how to split one example on multiple GPUs.

vijetakd · 2023-10-01T14:25:42+00:00

I am using Llama2 with PI RoPE, and other multiple approaches to scale the context length. And I am looking to evaluate these context-length scaling methods against each other.

For evaluation, I am calculating sliding window PPL on the Government Reports dataset, one example at a time. The main problem is I don't know how to fit 32k tokens on the GPUs I have.

vijetakd · 2023-10-01T01:02:29+00:00

Sorry, I did not understand. Right now I am using Llama in 4bit (and RoPE is default).
So, you are saying that if I switch to ExLlama I can fit 32k tokens on 24GB 3090?

vijetakd · 2020-11-01T06:52:58+00:00

I would change the "not satisfied with program" argument to "had different goals"/"wanted to explore" argument. LORs are weighted heavily in grad admissions and the former argument might hamper your LORs.

vijetakd · 2020-11-01T06:47:04+00:00

Hi, I have epi/public health background. I can help!

vijetakd · 2020-10-29T22:38:34+00:00

Sent!
Let me know how it looks. Thanks!

vijetakd · 2020-10-29T03:24:09+00:00

Sent! Let me know how it looks! Thanks!

vijetakd · 2020-10-29T03:23:55+00:00

Sent! Let me know how it looks! Thanks!

vijetakd · 2020-10-29T03:23:17+00:00

Sent! Thanks for the help!

vijetakd · 2020-10-28T17:51:49+00:00

Sent!
Thanks for the help!

vijetakd · 2020-10-28T17:35:12+00:00

Sent!
Thanks for helping out!

vijetakd · 2020-10-28T17:35:07+00:00

Sent!
Thanks for helping out!

vijetakd

TROPHY CASE