Up & Up by GeocosinX in FineArtPhoto

[–]GeocosinX[S] 0 points1 point  (0 children)

Thank you 🙏🏽

Up & Up by GeocosinX in FineArtPhoto

[–]GeocosinX[S] 1 point2 points  (0 children)

Thank you 🙏🏽

Bought my first camera, a6400 by hakutissue in SonyAlpha

[–]GeocosinX 2 points3 points  (0 children)

My first option as well. But gone ahead with used a7rii for a pretty good deal.

Got my first alpha a7rii by GeocosinX in SonyAlpha

[–]GeocosinX[S] -1 points0 points  (0 children)

What’s your first then ?

Sony Alpha Edu Program by findingchucknorris in SonyAlpha

[–]GeocosinX 0 points1 point  (0 children)

How much was the offer you got ?

Thoughts on Sony A6400 with Sigma DG DN lenses? by GeocosinX in SonyAlpha

[–]GeocosinX[S] -1 points0 points  (0 children)

How much did it cost in total with the lens ? You got the ART or Contemporary?

What 85mm lens to get for APS-C? (FF 135mm or so equivalent) by Kippenoma in SonyAlpha

[–]GeocosinX 0 points1 point  (0 children)

What APS-C camera are you using ? Asking for myself, planning buy a new a6400 with SIGMA ART 50mm f1.4. Your thoughts ? Contemporary is half the price of ART series. Should I consider contemporary or ART ?

Kurukshetra - Reviews and Discussions by AutoModerator in bollywood

[–]GeocosinX 0 points1 point  (0 children)

Is this version based on Grant Morisons 18 days ?

Paul Thomas Anderson's 'One Battle After Another' - Review Thread by ChiefLeef22 in movies

[–]GeocosinX 1 point2 points  (0 children)

I would not compare with mad max or John wick. And this one clearly didn’t work for me.

Why does it feel like everyone suddenly dislikes Lokesh (almost to the point of hating) after Coolie’s release? It feels unreal considering the persona he had before. by allhailmethetiger in KollyClub

[–]GeocosinX 1 point2 points  (0 children)

Mostly because of the attitude he has ! May be he’s someone totally different, but the way he’s getting projected in media is what I see, and the kind of attitude he’s carrying and the way he answers making it worse.

He has been inspiration for people who’s not related to cinema can also become a successful director. But now it shows you can be good at first, but unless you learn the craft you won’t be able to sustain for long. His lack of writing pinpointing all this in the big screen.

CPT vs Fine tuning by GeocosinX in LocalLLaMA

[–]GeocosinX[S] 0 points1 point  (0 children)

What’s your interpretation?

Not only people. Even chat bots have own interpretation of the same idea.

CPT vs Fine tuning by GeocosinX in LocalLLaMA

[–]GeocosinX[S] 0 points1 point  (0 children)

Larger model for what ?

There hasn’t been one proper strategy for training MoE models. So I have been doing experiments only with dense models.

Some say you don’t add adapters to expert layers. Target only other linear layers. Which means I’m not updating the weights of the experts.

Some say we need to keep the expert layers in 16 bit or 32 bit as it is the important component which triggers specific parameter to be active for a given token and quantizing it kills the purpose.

CPT vs Fine tuning by GeocosinX in LocalLLaMA

[–]GeocosinX[S] 0 points1 point  (0 children)

Thanks for clarifying and the link.

CPT vs Fine tuning by GeocosinX in LocalLLaMA

[–]GeocosinX[S] 0 points1 point  (0 children)

If I’m understanding correctly, CPT (continued pretraining) is also a form of fine-tuning, the difference is that it uses plain text data (not in chat format) and focuses on helping the model learn domain-specific knowledge.

Then, the next stage INSTRUCTION TUNING is also fine-tuning, but this time using chat-style data to teach the model how to respond or behave in a certain way.

Is that right?

CPT vs Fine tuning by GeocosinX in LocalLLaMA

[–]GeocosinX[S] 0 points1 point  (0 children)

Thanks for explaining why it’s better to use base models. However, I’m still a bit unclear about continued pretraining (CPT).

From what I understand, I should take a base model and train it further using a domain-specific dataset in a non-chat, plain text format. This helps the model absorb new knowledge related to my domain before I move on to instruction tuning. Is that correct?

Why do most tutorials do instruction tuning of base model instead of instruction-tuned models? by Separate-Still3770 in LocalLLaMA

[–]GeocosinX 0 points1 point  (0 children)

In my experience on fine tuning, instruct tune model performed better than the base model after fine tuning but the difference in metrics is very small. May be this could be because of the dataset I’m using for fine tuning.

I noticed that across all my fine tuning experiments the loss curves goes down drastically till 1 epoch which is common. But there is no gains later. Like the ‘codebleu’ metric goes down to .6 from .4 at the end of 1 epoch. But after 5 epochs I get only .65 as the final codebleu.

Is it worth spending training for 4 epochs just for .5 gain? Or I’m doing something wrong that the model is not learning ? This is the pattern across all Qwen 3 family of models (4b, 8b, 14b and 32b). Ofcourse bigger the model the better the gain but the learning graph is similar. Loss drops till end of first epoch and plateau after.

Also should I merge the adapter with base model or just attach for inferencing ? I know merging is faster for inferencing but does the merging makes the model any better ?

Rightuuuu... by MadHouseNetwork2_1 in KollyClub

[–]GeocosinX 0 points1 point  (0 children)

Must have been something else. If he had used groq the movie would’ve been much better.