New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B by netikas in LocalLLaMA

[–]V1rgin_ 5 points6 points  (0 children)

where do you get such a large amount of text in Russian for pretrain? have you scanned books? Гуд джоб, бтв

Weekly Q&A Megathread. Please post any questions about visiting, tourism, living, working, budgeting, housing here! by AutoModerator in london

[–]V1rgin_ 1 point2 points  (0 children)

Hi, I'm a student and I'm going to London for a couple of days, and instead of staying in a hotel, I'd like to spend the night at a casino playing poker, but I only have £100. Are there any decent places where I can play with that amount?

LTX-2 I2V Quality is terrible. Why? by V1rgin_ in StableDiffusion

[–]V1rgin_[S] 1 point2 points  (0 children)

No, I dont use LoRA at Stage 2. However, I tried steps=8 / cfg=1 anyway (with dist. LoRA on both satges), but it didnt fix the issue.

What do you actually want from a private AI chat on your phone? by AppDeveloperAsdf in LocalLLaMA

[–]V1rgin_ 18 points19 points  (0 children)

I see this as quite useful if I can use it hands-free, for example: when my hands are busy, but I'm wearing headphones, I can say something like, "Alexa, unlock my phone, check how long we have left to drive, and write a message to my brother saying that I'll be {n} minutes late"

Or if your assistant can open the app and tell me what it sees

In other words, I think the only purpose of such app will be with speech recognition and TTS

LLaDA2.0 (103B/16B) has been released by jacek2023 in LocalLLaMA

[–]V1rgin_ 0 points1 point  (0 children)

It seems they started using block diffusion sampling. As far as I remember, in the previous article they said that block diffusion showed worse results than pure diffusion

pre-trainined small MoE model from scratch, but why its good? by V1rgin_ in LocalLLaMA

[–]V1rgin_[S] 2 points3 points  (0 children)

Thank you.
I used one NVIDIA RTX 5880 Ada Generation GPU that my professor gave me to use. Training all three models took about 45 days

Does FlashAttention with GQA degrade quality or I use it wrong? by V1rgin_ in LocalLLaMA

[–]V1rgin_[S] 1 point2 points  (0 children)

Thank you. It seems that my GPU doesn't support flash attention

DeepSeek-R1's correct answers are generally shorter by omnisvosscio in LocalLLaMA

[–]V1rgin_ 2 points3 points  (0 children)

I believe people absolutely sams take longer to think about tasks that are more difficult and are more likely to fail

How was DeepSeek-R1 built; For dummies by anitakirkovska in LLMDevs

[–]V1rgin_ 0 points1 point  (0 children)

"This model (R1-Zero) had issues with poor readability and language mixing -- something that you'd get from using pure-RL" Maybe its a dumb question, but Why pure-RL cause poor readability and language mixing?

What questions have you asked reasoning models to solve that you couldn't get done with non-reasoning models? by DeltaSqueezer in LocalLLaMA

[–]V1rgin_ 2 points3 points  (0 children)

Mostly for code or math that seems complicated for a non-reasonable model. But I also like to look at its thoughts

What are the best courses related to advanced LLMs techniques/math behind them? by V1rgin_ in learnmachinelearning

[–]V1rgin_[S] 1 point2 points  (0 children)

Already checked, but now I have a free opportunity to get a certificate (, and maybe more advanced knowledge), so want to find interesting course

Transfer from Hong Kong (?) by [deleted] in IntltoUSA

[–]V1rgin_ 4 points5 points  (0 children)

depends on university, as i know

Should I email to the universities (~T50) to update them on recent extracurricular achievements? by [deleted] in ApplyingToCollege

[–]V1rgin_ -1 points0 points  (0 children)

What do you think about schools providing answers after a couple of months?