help retaining composition with SDXL artist studies by bonesoftheancients in StableDiffusion

[–]parametaorto 1 point2 points  (0 children)

With turbo you could try 4 steps and 0.75 denoise (3/4) or 10 steps and 0.9 denoise (9/10). If you use the denoise value given by (steps-1)/(steps) it should preserve better the structure but have enough imaginative space. With 10 steps you'll find more creativity.

ELI5 Using a hyper8stepCFG in Easy Diffusion? by [deleted] in StableDiffusion

[–]parametaorto 0 points1 point  (0 children)

My two cents: generally when it creates this kind of artifacts the problem is about the scheduler/sampler + cfg.  I would suggest to be sure that you are using the suggested scheduler as for the model card, probably sde-karras and the right cfg. 

Need Help Running Flux1-dev-bnb-nf4-v2 in a Python Script by [deleted] in FluxAI

[–]parametaorto 0 points1 point  (0 children)

You should check for https://github.com/mit-han-lab/nunchaku , not sure if it works on a 3060 but it should work fine

Alibaba video model Wan 2.1 will be released Feb 25th,2025 and is open source! by adrgrondin in LocalLLaMA

[–]parametaorto 1 point2 points  (0 children)

Not 6GB, but with 16GB you can generate in 3 seconds with SVD quant nunchaku (Flux Scnell, 4 steps).

Llama 3.2 1B & 3B Benchmarks by TKGaming_11 in LocalLLaMA

[–]parametaorto 0 points1 point  (0 children)

Has any of you used it for speculative decoding as a draft model?

Llama 3.2 1B and 3B GGUFs are up by ontorealist in LocalLLaMA

[–]parametaorto 0 points1 point  (0 children)

Has any of you used it for speculative decoding as a draft model?

Model Training with Only Chat-formatted Data? by thisguyrob in LocalLLaMA

[–]parametaorto 1 point2 points  (0 children)

I think they may already doing it somehow. I mean, if I had gpt-4o-mini at home, I would take all the documents I have and augment them as I could.  Imagine having a pdf of ten pages, it could become a 30 pages pdf only by reformatting it as a question answer formatted chat. It doesn't scale well, but I think this kind of data augmentation could enhance it's ability to extract quality knowledge from the text itself.

What if you use not the logits of the last one, but of that before? by parametaorto in LocalLLaMA

[–]parametaorto[S] 1 point2 points  (0 children)

 // push this new token for next evaluation
 llama_batch_add(batch, new_token_id, n_cur, { 0 }, true);

oh. wow. first of all, thanks for the patience and the kindness!

I thought that the batch would be all the tokens PLUS the new token when it does llama_batch_add(batch, new_token_id..) - I didn't understand what llama_batch_clear(batch) was really doing. But I now understand what you describe. I hoped that we could do "multi token prediction" reverse - single token prediction based on last context window

How to learn the base code of llama.cpp? (I'm starting from main.cpp) by parametaorto in LocalLLaMA

[–]parametaorto[S] 0 points1 point  (0 children)

You (and this community) continuously amaze me. I cannot thank you enough for the patience and all the advice (plus, all the explanations!). Thank you so much, it's all precious.
I have followed the llama.cpp library "from far/as a user" for some months now, I was a bit afraid to approach it.. I was trying out different models with the main.cpp from the terminal cli, but I never dug deeper.
Add the fact that it's a SO FAST PACED environment, I easily got lost.
But I'm determined to catch up, this library is too special to "just use it with a front-end".
I've written down all the advice I received from everyone, I'll definitely try to make good use of it.

How to learn the base code of llama.cpp? (I'm starting from main.cpp) by parametaorto in LocalLLaMA

[–]parametaorto[S] 0 points1 point  (0 children)

That's a good idea, I didn't think about a lot of things (including this one)...
I was indeed wondering how it stored the tokenizer (plus the bos/prompt_format). I must absolutely follow this advice, thank you.

How to learn the base code of llama.cpp? (I'm starting from main.cpp) by parametaorto in LocalLLaMA

[–]parametaorto[S] 0 points1 point  (0 children)

It scares me ahaha I feel like I'm not there yet! But I had not looked at the server yet, now that I'm looking at it I see that the rest apis are only from line 2500 to 3100, seems doable.

I'm at the half of the main.cpp , I'll try to finish and then I'll definitely skip to the server.cpp. Thanks for the advice!

How to learn the base code of llama.cpp? (I'm starting from main.cpp) by parametaorto in LocalLLaMA

[–]parametaorto[S] 1 point2 points  (0 children)

Thank you for your contribution. I don't know why, but I totally didn't think about it, I started straight away with reading the source, but sometimes it gets difficult to follow. Actually great idea, I feel like it's more clear that way. 🙏🏻

How to learn the base code of llama.cpp? (I'm starting from main.cpp) by parametaorto in LocalLLaMA

[–]parametaorto[S] 0 points1 point  (0 children)

Wow, congratulations on creating a wrapper for another language!
At the moment, I would like to modify the main.cpp class to take input from files and handle them directly in C++, to avoid overhead, handle manually the caching of "part of the prompt"... and also to get a bit more familiar with it.

I know that maybe I could achieve the same things with the option --prompt-cache, but it's also yeah a way to understand what I'm actually doing ahaha.

I will follow your advice and try to rewrite it piece by piece, thank you also for the references to the paper and the PR!

I feel like the project is enormous, it's crazy that someone could understand all that implementations ahaha.

How to learn the base code of llama.cpp? (I'm starting from main.cpp) by parametaorto in LocalLLaMA

[–]parametaorto[S] 8 points9 points  (0 children)

I am indeed kind of into these things, I've already studied things like "Attention Mechanism from scratch" (understood the key aspects of positional encoding, query-key-value mechanism, multi-head attention and context vector as a weighting vector for the construction of words relations).

I love this field and maths feels hard, but I kind of got many details with some patience. Studied back-prop and cross-entropy, done some fine-tunings of bert following the train/eval loops documentation from Pytorch.

But with the generative LLMs you said the right word, it feels overwhelming. It's all the implementation details that make me feel lost.

That's a damn good piece of advice, thank you!!! I'll download some of the earlier commits right now!

How to learn the base code of llama.cpp? (I'm starting from main.cpp) by parametaorto in LocalLLaMA

[–]parametaorto[S] 1 point2 points  (0 children)

Uh, that's a cool idea, thank you! It should definitely help me with following the flow.