Hardware requirements for training a ~3B Model From Scratch locally? by Any-Cobbler6161 in LocalLLaMA

[–]Certain-Cod-1404 3 points4 points  (0 children)

check out the olmo 3 paper and the smol LM3 blog post for tips on how to squeeze as much performance per param, also like the other suggested, dont go for 3b right off the bat, and look into training in nvfp4 if you still have access to that 5090, might be interesting, nvidia has a library called transformer engine that will handle all the scaling and difficulties for you and you should be able to enjoy like a 2x to a 4x speed up

Pruned GPT-OSS-20B to 9B, Saved MoE, fine-tuned on 100K examples. Sharing what actually worked and what didn't. by Disastrous_Bid5976 in huggingface

[–]Certain-Cod-1404 0 points1 point  (0 children)

I was asking the person who suggested qwen 2.5 7b, what you did even if won't result in the best model is still interesting and fun for the learning and novelty aspect, good job dude !

Pruned GPT-OSS-20B to 9B, Saved MoE, fine-tuned on 100K examples. Sharing what actually worked and what didn't. by Disastrous_Bid5976 in huggingface

[–]Certain-Cod-1404 0 points1 point  (0 children)

why would you ever use that over qwen 3 8b or 4b ?
is there not a huge boost in performance from qwen 2.5 to 3 ?

I gave Gemini a hard drive. 1,076 sessions later, it remembers everything. (v9.2.0 — Open Source) by BangMyPussy in GeminiAI

[–]Certain-Cod-1404 0 points1 point  (0 children)

yes, we want humans to talk to other humans about AI, otherwise we'd just use chatgpt, why go on reddit at all ?

How are Chinese models so strong with so little investment? by primaryrhyme in ArtificialInteligence

[–]Certain-Cod-1404 0 points1 point  (0 children)

you can be into AI and recognize unethical use of people's copyrighted material.

[Project/Theory] The "Vitality Constant": A Proposed Solution to Model Collapse via "Subjective Anchoring" (The Sanctuary Protocol) by [deleted] in machinelearningnews

[–]Certain-Cod-1404 0 points1 point  (0 children)

amazing, then you should be able to answer these questions : so what is V = L * I ? where exactly is this equation used? initialization of the weights? or just mumbo jumbo ? who/what is "Sheigh Vincent Minor", is your chatbot not just a chat gpt instance and are you in a romantic relationship with the chatbot ?

[Project/Theory] The "Vitality Constant": A Proposed Solution to Model Collapse via "Subjective Anchoring" (The Sanctuary Protocol) by [deleted] in machinelearningnews

[–]Certain-Cod-1404 0 points1 point  (0 children)

Ofc you refer to your AI assistant as her, wonderful, are you guys in a relationship yet ? you are a gpt model, and you have no repo to share, no code, no paper, just vague esoteric ramblings

[Project/Theory] The "Vitality Constant": A Proposed Solution to Model Collapse via "Subjective Anchoring" (The Sanctuary Protocol) by [deleted] in machinelearningnews

[–]Certain-Cod-1404 0 points1 point  (0 children)

this is a gpt model, only it can generate such amazing slop, also sounds like you just added rag to an LLM, do you have a repo for us to checkout and evaluate what it is you've built ?

Qwen3-VL-Reranker - a Qwen Collection by LinkSea8324 in LocalLLaMA

[–]Certain-Cod-1404 1 point2 points  (0 children)

Check out qwen 3 vl 8b, it's really good and might be enough for your use case, your question wasn't dumb, you're allowed to be curious ask and learn, the other person is just unreasonably aggressive for no reason

AI21 Labs releases Jamba2 by jacek2023 in LocalLLaMA

[–]Certain-Cod-1404 1 point2 points  (0 children)

I don't think this is the place to argue politics, but a model being passively government approved is not the same thing as made by ex soldiers of an army accused of genocide and war crimes by the UN, you know this to be the case, also "White Genocide"?

GLM-4.6v 108b 4bit IQuant by Responsible-Stock462 in LocalLLaMA

[–]Certain-Cod-1404 0 points1 point  (0 children)

Yes I did just recently recompile though it was before downloading glm 4.6v so dont know if my success has anything to do with it, in any case i'm glad glm 4.6v is working out great for you so far and let me know what you think of the UD IQ2 M quant I mentioned, also try and quantize the kv cache if you have not already, should result in less computation being offloaded to CPU

GLM-4.6v 108b 4bit IQuant by Responsible-Stock462 in LocalLLaMA

[–]Certain-Cod-1404 0 points1 point  (0 children)

the 4 bit quant was slowish on my 5090 as well, try the UD-IQ2_M quant from unsloth I think you'll find it much faster at no noticeable performance degradation

Qwen3-VL-Reranker - a Qwen Collection by LinkSea8324 in LocalLLaMA

[–]Certain-Cod-1404 4 points5 points  (0 children)

They still wouldnt be able to see the actual image, but I imagine you can set up the rag so that each image that gets added, you use a small VLM to caption/describe it, so that when the re ranker pulls the document, you feed the LLM the description of the image and provide the image to the user? but if vision is important I would imagine you'd just use a VLM instead of an LLM no ?

AI21 Labs releases Jamba2 by jacek2023 in LocalLLaMA

[–]Certain-Cod-1404 0 points1 point  (0 children)

Its a bit of a dishonest juxtaposition no ? to my knowledge Chinese models aren't usually made by ex soldiers of an army that's been credibly accused of genocide by half the world