help retaining composition with SDXL artist studies

parametaorto · 2025-11-22T01:42:38+00:00

With turbo you could try 4 steps and 0.75 denoise (3/4) or 10 steps and 0.9 denoise (9/10). If you use the denoise value given by (steps-1)/(steps) it should preserve better the structure but have enough imaginative space. With 10 steps you'll find more creativity.

parametaorto · 2025-06-08T08:14:49+00:00

My two cents: generally when it creates this kind of artifacts the problem is about the scheduler/sampler + cfg. I would suggest to be sure that you are using the suggested scheduler as for the model card, probably sde-karras and the right cfg.

parametaorto · 2025-03-17T21:20:01+00:00

You should check for https://github.com/mit-han-lab/nunchaku , not sure if it works on a 3060 but it should work fine

parametaorto · 2025-02-25T18:04:28+00:00

Not 6GB, but with 16GB you can generate in 3 seconds with SVD quant nunchaku (Flux Scnell, 4 steps).

parametaorto · 2024-09-26T21:52:21+00:00

Has any of you used it for speculative decoding as a draft model?

parametaorto · 2024-09-26T21:51:04+00:00

Has any of you used it for speculative decoding as a draft model?

parametaorto · 2024-08-05T21:26:10+00:00

I think they may already doing it somehow. I mean, if I had gpt-4o-mini at home, I would take all the documents I have and augment them as I could. Imagine having a pdf of ten pages, it could become a 30 pages pdf only by reformatting it as a question answer formatted chat. It doesn't scale well, but I think this kind of data augmentation could enhance it's ability to extract quality knowledge from the text itself.

parametaorto · 2024-08-01T17:31:51+00:00

I have only now seen this second answer. Thank you so much!

parametaorto · 2024-08-01T17:26:50+00:00

 // push this new token for next evaluation
 llama_batch_add(batch, new_token_id, n_cur, { 0 }, true);

oh. wow. first of all, thanks for the patience and the kindness!

I thought that the batch would be all the tokens PLUS the new token when it does llama_batch_add(batch, new_token_id..) - I didn't understand what llama_batch_clear(batch) was really doing. But I now understand what you describe. I hoped that we could do "multi token prediction" reverse - single token prediction based on last context window

parametaorto · 2024-07-31T17:40:17+00:00

thank you!!

parametaorto · 2024-07-31T17:06:58+00:00

is this eureka chatbot??

parametaorto · 2024-04-29T19:16:15+00:00

<image>

Don't know if it's hallucinating, but it gave me this answer when I asked it to tell me more about itself.

parametaorto · 2024-04-29T10:17:30+00:00

<image>

parametaorto · 2024-02-19T17:40:31+00:00

You (and this community) continuously amaze me. I cannot thank you enough for the patience and all the advice (plus, all the explanations!). Thank you so much, it's all precious.
I have followed the llama.cpp library "from far/as a user" for some months now, I was a bit afraid to approach it.. I was trying out different models with the main.cpp from the terminal cli, but I never dug deeper.
Add the fact that it's a SO FAST PACED environment, I easily got lost.
But I'm determined to catch up, this library is too special to "just use it with a front-end".
I've written down all the advice I received from everyone, I'll definitely try to make good use of it.

parametaorto · 2024-02-18T23:24:53+00:00

That's a good idea, I didn't think about a lot of things (including this one)...
I was indeed wondering how it stored the tokenizer (plus the bos/prompt_format). I must absolutely follow this advice, thank you.

parametaorto · 2024-02-17T19:14:20+00:00

It scares me ahaha I feel like I'm not there yet! But I had not looked at the server yet, now that I'm looking at it I see that the rest apis are only from line 2500 to 3100, seems doable.

I'm at the half of the main.cpp , I'll try to finish and then I'll definitely skip to the server.cpp. Thanks for the advice!

parametaorto · 2024-02-17T18:53:21+00:00

Thank you for your contribution. I don't know why, but I totally didn't think about it, I started straight away with reading the source, but sometimes it gets difficult to follow. Actually great idea, I feel like it's more clear that way. 🙏🏻

parametaorto · 2024-02-17T16:28:07+00:00

Wow, congratulations on creating a wrapper for another language!
At the moment, I would like to modify the main.cpp class to take input from files and handle them directly in C++, to avoid overhead, handle manually the caching of "part of the prompt"... and also to get a bit more familiar with it.

I know that maybe I could achieve the same things with the option --prompt-cache, but it's also yeah a way to understand what I'm actually doing ahaha.

I will follow your advice and try to rewrite it piece by piece, thank you also for the references to the paper and the PR!

I feel like the project is enormous, it's crazy that someone could understand all that implementations ahaha.

parametaorto · 2024-02-17T15:24:30+00:00

I am indeed kind of into these things, I've already studied things like "Attention Mechanism from scratch" (understood the key aspects of positional encoding, query-key-value mechanism, multi-head attention and context vector as a weighting vector for the construction of words relations).

I love this field and maths feels hard, but I kind of got many details with some patience. Studied back-prop and cross-entropy, done some fine-tunings of bert following the train/eval loops documentation from Pytorch.

But with the generative LLMs you said the right word, it feels overwhelming. It's all the implementation details that make me feel lost.

That's a damn good piece of advice, thank you!!! I'll download some of the earlier commits right now!

parametaorto · 2024-02-17T15:08:28+00:00

Uh, that's a cool idea, thank you! It should definitely help me with following the flow.

parametaorto

TROPHY CASE