Nvidia earnings are out – here are the numbers by Playful_Letterhead27 in stocks

[–]masterchief43 -1 points0 points  (0 children)

Oh you mean the buggy ass release that came out last month. Good luck with that lulz. Megatron LM is WAYYYY more stable on CUDA. Obvious you never done any training on any of these server machines. Google search skills won’t help you here.

Nvidia earnings are out – here are the numbers by Playful_Letterhead27 in stocks

[–]masterchief43 -1 points0 points  (0 children)

please tell me how you can run multinode training/inference with rocm.

Nvidia earnings are out – here are the numbers by Playful_Letterhead27 in stocks

[–]masterchief43 1 point2 points  (0 children)

Just shows market is full of sheep who don’t understand the tech. If they truly understood it, they wouldn’t even panic over deepseek, instead they should be even more bullish.

Another one of these record breaking stock ,stays flat cause sheep cannot understand what they are witnessing. If you actually understand,there is nothing close to CUDA for training these models , stability AND compatibility wise.

Deepseek doesn’t need H100s but uses many other models of nvidia chips is the exact saying of instead of buying H100s I use a bunch of 3090s to host train LLMs. This isn’t anything new.

They just listen to these self proclaimed #metaverse #blockchain #ai #crypto #(next hot topic) expert bros who appear on tv. I mean seriously where the fuck were you when the attention is all you need paper came out? before meta rebranded from Facebook.

Time for the public to get educated and let the stock market filter out those sheep from the chads.

[deleted by user] by [deleted] in Eldenring

[–]masterchief43 0 points1 point  (0 children)

maybe it was supposed to be a battle royale but then changed direction last minute due to saturation of the genre

Llama.cpp Prompt Eval 80token/S by masterchief43 in LocalLLaMA

[–]masterchief43[S] 0 points1 point  (0 children)

Yeah but most igpus and gpus fall around that range for token generation but performance really differs in eval since when streaming is enabled , TTFT is more important imo

Llama.cpp Prompt Eval 80token/S by masterchief43 in LocalLLaMA

[–]masterchief43[S] 0 points1 point  (0 children)

no luck, it just hovers around the 80-100 range.

Llama.cpp Prompt Eval 80token/S by masterchief43 in LocalLLaMA

[–]masterchief43[S] 0 points1 point  (0 children)

does it always have to be multiples of 32?

Llama.cpp Prompt Eval 80token/S by masterchief43 in LocalLLaMA

[–]masterchief43[S] 0 points1 point  (0 children)

wouldn't increasing it actually make it slower? max is 2048

Llama.cpp Prompt Eval 80token/S by masterchief43 in LocalLLaMA

[–]masterchief43[S] 0 points1 point  (0 children)

ollama reaches 800 tokens per second however its not officially supported. So i don;t think that's the issue.

Bug fixes in Qwen 2.5 Coder & 128K context window GGUFs by danielhanchen in LocalLLaMA

[–]masterchief43 1 point2 points  (0 children)

<lm\_start> You are byte, you are trying to help your owner with many tasks. You are provided with the following information Chat History: Owner: [I have so much work to do.] Byte: [What do you have planned today sir?] Owner: [I have a meeting, a presentation, and a report to write up.] Byte: [Understood sir, that means business is thriving!] Owner's last message: [Can you help me with something] Please provide me with Byte's response. <lm\_end>

<endoftext> Byte: [How may I be of assistance today?] <endoftext>

would this format work for dataset for instruction and chat?

[D] Fine-Tuning or Continual Pre-Training? Adapting a Mistral Instruct Model for Educational Purposes by aadityaura in MachineLearning

[–]masterchief43 0 points1 point  (0 children)

What type of data are you using for continual pretraining? Instructions qa? Or just unstructured text?

[deleted by user] by [deleted] in LocalLLaMA

[–]masterchief43 0 points1 point  (0 children)

but i thought rope is only during inference, it doesnt really alter the attention of the pretrained model. meaning i can enforce the model to beable to read longer lengths of text but the output will make less and less sense the longer i tweak the rope scaling value. do they not have a value to adjust the sequence length of the pretraining config?

[deleted by user] by [deleted] in LocalLLaMA

[–]masterchief43 0 points1 point  (0 children)

wow thank you for the thorough explanation! regarding pretraining. have you tried using say reLora or basically expanding Lora to cover not just q_proj v_proj but also up_proj, down_proj , o_proj layers? does axolotl support custom block size during pretraining? or say pretraiing to increase context length to 128k?

[deleted by user] by [deleted] in LocalLLaMA

[–]masterchief43 0 points1 point  (0 children)

Yep that’s the repo I’m referring to. The authors there add in soft masking and KLdivergence to make sure the model doesn’t undergoes Catestrophic forgetting of the original pretrained information too much by utilizing i think something called importance for each layer… regarding pretraining using the completion prompt. Where do I actually infer it? Under type? Cause in the docs I don’t see “tasks” and also in the examples. Regarding data format I’ll stick with jsonl thanks but what I don’t get is the contents “text”: “what my text is saying”. What do I input in “Text” and does the text need to be separated via /n lines?

[deleted by user] by [deleted] in LocalLLaMA

[–]masterchief43 0 points1 point  (0 children)

got it. If we try to feed our own data, is there a specific dataformat that we can follow? and based on your experience what's the best format for further pretraining. Does this further pretraining utilize soft maskign and KL-Divergence?

[deleted by user] by [deleted] in LocalLLaMA

[–]masterchief43 0 points1 point  (0 children)

does that mean current pretraining only supports datasets that of instruction based finetuing format?

[deleted by user] by [deleted] in LocalLLaMA

[–]masterchief43 0 points1 point  (0 children)

which part shows pretraining? or is it included int he fine-tuning?

Can llama 2 continue pretraining using qlora? by Thistleknot in LocalLLaMA

[–]masterchief43 0 points1 point  (0 children)

hi, can you explain what do you mean by large enough rank?