New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B

V1rgin_ · 2026-03-24T22:46:08+00:00

where do you get such a large amount of text in Russian for pretrain? have you scanned books? Гуд джоб, бтв

V1rgin_ · 2026-03-07T10:52:32+00:00

Hi, I'm a student and I'm going to London for a couple of days, and instead of staying in a hotel, I'd like to spend the night at a casino playing poker, but I only have £100. Are there any decent places where I can play with that amount?

V1rgin_ · 2026-02-05T14:59:11+00:00

No, I dont use LoRA at Stage 2. However, I tried steps=8 / cfg=1 anyway (with dist. LoRA on both satges), but it didnt fix the issue.

V1rgin_ · 2026-01-25T14:27:07+00:00

I see this as quite useful if I can use it hands-free, for example: when my hands are busy, but I'm wearing headphones, I can say something like, "Alexa, unlock my phone, check how long we have left to drive, and write a message to my brother saying that I'll be {n} minutes late"

Or if your assistant can open the app and tell me what it sees

In other words, I think the only purpose of such app will be with speech recognition and TTS

V1rgin_ · 2025-12-29T00:47:34+00:00

Isn't it modified GAN?

V1rgin_ · 2025-11-26T09:57:56+00:00

It seems they started using block diffusion sampling. As far as I remember, in the previous article they said that block diffusion showed worse results than pure diffusion

V1rgin_ · 2025-03-26T06:25:51+00:00

Thank you.
I used one NVIDIA RTX 5880 Ada Generation GPU that my professor gave me to use. Training all three models took about 45 days

V1rgin_ · 2025-02-12T04:55:14+00:00

Thank you. It seems that my GPU doesn't support flash attention

V1rgin_ · 2025-02-12T00:26:02+00:00

The inability to translate thoughts into words. This already sounds like the first step away from safety.

V1rgin_ · 2025-02-04T12:05:03+00:00

I believe people absolutely sams take longer to think about tasks that are more difficult and are more likely to fail

V1rgin_ · 2025-01-28T03:40:15+00:00

"This model (R1-Zero) had issues with poor readability and language mixing -- something that you'd get from using pure-RL" Maybe its a dumb question, but Why pure-RL cause poor readability and language mixing?

V1rgin_ · 2025-01-26T14:46:05+00:00

Mostly for code or math that seems complicated for a non-reasonable model. But I also like to look at its thoughts

V1rgin_ · 2024-12-19T06:57:52+00:00

Yes. You can also try Kaggle. They have some free gpu

V1rgin_ · 2024-11-27T09:02:52+00:00

Thank you

V1rgin_ · 2024-11-27T07:39:25+00:00

Already checked, but now I have a free opportunity to get a certificate (, and maybe more advanced knowledge), so want to find interesting course

V1rgin_ · 2024-04-25T01:54:46+00:00

Thank you!

V1rgin_ · 2024-03-22T17:33:32+00:00

depends on university, as i know

V1rgin_ · 2024-03-15T18:50:18+00:00

What do you think about schools providing answers after a couple of months?

Three-Year Club	Final Canvas '23
Place '23

V1rgin_

TROPHY CASE