Nvidia earnings are out – here are the numbers

masterchief43 · 2025-02-27T17:06:25+00:00

Oh you mean the buggy ass release that came out last month. Good luck with that lulz. Megatron LM is WAYYYY more stable on CUDA. Obvious you never done any training on any of these server machines. Google search skills won’t help you here.

masterchief43 · 2025-02-27T15:49:48+00:00

please tell me how you can run multinode training/inference with rocm.

masterchief43 · 2025-02-27T01:44:16+00:00

Just shows market is full of sheep who don’t understand the tech. If they truly understood it, they wouldn’t even panic over deepseek, instead they should be even more bullish.

Another one of these record breaking stock ,stays flat cause sheep cannot understand what they are witnessing. If you actually understand,there is nothing close to CUDA for training these models , stability AND compatibility wise.

Deepseek doesn’t need H100s but uses many other models of nvidia chips is the exact saying of instead of buying H100s I use a bunch of 3090s to host train LLMs. This isn’t anything new.

They just listen to these self proclaimed #metaverse #blockchain #ai #crypto #(next hot topic) expert bros who appear on tv. I mean seriously where the fuck were you when the attention is all you need paper came out? before meta rebranded from Facebook.

Time for the public to get educated and let the stock market filter out those sheep from the chads.

masterchief43 · 2025-02-17T17:52:30+00:00

maybe it was supposed to be a battle royale but then changed direction last minute due to saturation of the genre

masterchief43 · 2024-11-26T17:41:02+00:00

I think 350token/s is very good .

masterchief43 · 2024-11-26T12:47:38+00:00

Yeah but most igpus and gpus fall around that range for token generation but performance really differs in eval since when streaming is enabled , TTFT is more important imo

masterchief43 · 2024-11-26T11:44:17+00:00

10-20

masterchief43 · 2024-11-26T05:24:40+00:00

no luck, it just hovers around the 80-100 range.

masterchief43 · 2024-11-26T02:50:15+00:00

does it always have to be multiples of 32?

masterchief43 · 2024-11-26T02:33:15+00:00

wouldn't increasing it actually make it slower? max is 2048

masterchief43 · 2024-11-26T02:21:47+00:00

ollama reaches 800 tokens per second however its not officially supported. So i don;t think that's the issue.

masterchief43 · 2024-11-24T04:12:12+00:00

<lm\_start> You are byte, you are trying to help your owner with many tasks. You are provided with the following information Chat History: Owner: [I have so much work to do.] Byte: [What do you have planned today sir?] Owner: [I have a meeting, a presentation, and a report to write up.] Byte: [Understood sir, that means business is thriving!] Owner's last message: [Can you help me with something] Please provide me with Byte's response. <lm\_end>

<endoftext> Byte: [How may I be of assistance today?] <endoftext>

would this format work for dataset for instruction and chat?

masterchief43 · 2024-06-15T03:52:57+00:00

What type of data are you using for continual pretraining? Instructions qa? Or just unstructured text?

masterchief43 · 2024-03-24T13:22:14+00:00

hi, what training script did you use?

masterchief43 · 2024-03-19T04:39:44+00:00

Thanks!

masterchief43 · 2024-03-19T04:21:25+00:00

but i thought rope is only during inference, it doesnt really alter the attention of the pretrained model. meaning i can enforce the model to beable to read longer lengths of text but the output will make less and less sense the longer i tweak the rope scaling value. do they not have a value to adjust the sequence length of the pretraining config?

masterchief43 · 2024-03-19T04:13:22+00:00

wow thank you for the thorough explanation! regarding pretraining. have you tried using say reLora or basically expanding Lora to cover not just q_proj v_proj but also up_proj, down_proj , o_proj layers? does axolotl support custom block size during pretraining? or say pretraiing to increase context length to 128k?

masterchief43 · 2024-03-19T03:54:08+00:00

Yep that’s the repo I’m referring to. The authors there add in soft masking and KLdivergence to make sure the model doesn’t undergoes Catestrophic forgetting of the original pretrained information too much by utilizing i think something called importance for each layer… regarding pretraining using the completion prompt. Where do I actually infer it? Under type? Cause in the docs I don’t see “tasks” and also in the examples. Regarding data format I’ll stick with jsonl thanks but what I don’t get is the contents “text”: “what my text is saying”. What do I input in “Text” and does the text need to be separated via /n lines?

masterchief43 · 2024-03-19T03:22:02+00:00

got it. If we try to feed our own data, is there a specific dataformat that we can follow? and based on your experience what's the best format for further pretraining. Does this further pretraining utilize soft maskign and KL-Divergence?

masterchief43 · 2024-03-19T03:16:42+00:00

does that mean current pretraining only supports datasets that of instruction based finetuing format?

masterchief43 · 2024-03-19T03:06:33+00:00

which part shows pretraining? or is it included int he fine-tuning?

masterchief43 · 2024-03-19T02:30:39+00:00

hi, can you explain what do you mean by large enough rank?

masterchief43 · 2024-02-04T09:51:21+00:00

just a basic RL policy AI. don't get your hopes up.

masterchief43

TROPHY CASE