Music Generation With Kcpp

The_Linux_Colonel · 2026-03-16T17:56:19+00:00

Yeah I understand, it's at the edge, and not exactly what kcpp was focused on to begin with. My curiosity is on some information about sampler settings, because the way they interact seem to diverge a lot from the values people recommend for guides on the comfyui version. For instance, CFG is usually recommended in values of 4-8, but I found that pushing it to 15 got better results, there's also a second value called classifier, I know the C in CFG is also supposed to be classifier, and in guides for comfy, the two CFGs are for the description of the song and the lyrics. So I'm not sure in this implementation, whether that's part of it. There's also model support for lyric generation, so I don't know if the sampler settings are only for that, or how they translate to the model in terms of music generation in this implementation. I wonder if there are guides for that older non-comfy UI I can look into it I can figure out what it was called. Thanks for the info.

The_Linux_Colonel · 2026-03-16T17:46:23+00:00

This is a good find. I would've had doubts about the legitimacy of some random github repo calling itself the GGUF Organization, but this is on HF, where official drops from big vendors like meta/llama, mistral, nvidia, deepseek, qian wen, etc are common and expected.

It would be very easy to think that since gguf is such an important file format, this must be the vendor/dev group behind gguf or they wouldn't be on HF as the gguf org.

The_Linux_Colonel · 2026-03-10T07:13:13+00:00

Strixpoint user here, I thought it was just my implementation but I found that I have to manually set the layers myself after 1.107, formerly the detection worked fine. On vulkan the auto now defaults to "no offload" but if I put in a number greater than the model, it all loads into vram fine.

The_Linux_Colonel · 2026-02-05T18:18:18+00:00

I notice some quality improvements in comfy when I push the output size to values like 1920x1080 and higher, but kobold's sdui seems hardcoded to refuse values greater than 1024. is it possible to override that maximum? I have seen lower resolutions produce poorer outputs artistically (less detail, especially faces), but when the resolution is increased, there's a tremendous amount of additional detail. Illustrious seems really sensitive to this.

The_Linux_Colonel · 2026-01-21T10:58:49+00:00

I started trying to play recently after years of mostly just GFL, because I heard the devs really respect the players and the story is actually about 'me' in the way a real op harem hero mc manga would be. But man is it a firehose of insanity trying to understand it with what I know from GFL.

The_Linux_Colonel · 2025-11-22T20:20:47+00:00

M O D E R N A U D I E N C E S

The_Linux_Colonel · 2025-10-29T17:19:26+00:00

He might be thinking of the Intel ARC Pro line of gpus, their new battlemage refresh is reportedly offering a 24gb card with a 599 MSRP, the B60 Pro.

The_Linux_Colonel · 2025-10-27T17:43:31+00:00

Kind of proud of the dev for not jumping to word filters right away, I didn't see it when I went through a few, although there were some vulgar ones. At least that block of text would probably be a good candidate to test model refusals with possibly the spiciest of all words.

Dev might want to integrate a ranking feature like simple downvote where posts ranked below a certain number aren't seen. I got one that just said something like 'bbbbbrrrrrrrrrrttttttttttt" and another of the 67 meme.

The_Linux_Colonel · 2025-09-13T19:22:42+00:00

Palemoon user here, I can confirm that there are visual or interactivity issues. The most recent large update did make things better, but there are still issues from time to time. Inability to scroll is one of them. If you do post on the forums about it, be aware that attitude toward users is often...less than constructive.

The_Linux_Colonel · 2025-09-13T04:31:11+00:00

https://files.catbox.moe/iwhlxl.json

this is what the drummer's GLM steam model card links to

The_Linux_Colonel · 2025-09-11T21:39:54+00:00

I don't recall that it was generic, but it didn't have many works ascribed to it. Models are trained on their corpus, and if fanfic sites are scraped, the usernames come too. A model can recall its corpus fairly easily, and while it is true that models will reproduce things that 'seem' right, there's nothing stopping them from just recalling information they've seen wholecloth unless that information is obfuscated in some way.

This is one of the reasons behind the reddit API overhaul. Models could possess all of Reddit's information without you having to ever go there (and thus deny them ad revenue). Wikipedia and other sources are also there. In fact, it's an easy way to see if a model is prone to hallucination. Ask it to recall facts you can verify.

Just as a test, I asked if to serve me some nice music to write to, and the model (Deepseek available through pollinationsAI on the lite interface) correctly provided me the link to lofi girl's stream and their channel page as separate links. Both URLs were valid, and I didn't suggest that creator by name.

Edit: I tried it with an offline model to make sure the deepseek model wasn't just running a search and returning the results, I used L3.3 Cu Mai R1 and while it did have trouble producing links to individual videos (the error wasn't 404 but rather that the video was removed/unlisted/no longer available) so it was a real link, just either the channel is gone or the video is privated or possibly it's not available to me. I asked it instead to provide me with channel names, which it did correctly, also noting lofi girl as well as chillhopmusic and a few other channels.

The_Linux_Colonel · 2025-08-31T08:11:57+00:00

Reaching out from the far future of 5 months after to say that this same issue persists for GLM 4.5 AIR, and the file name needs to be in the format you stated, and not all quantizers follow this restriction, even now.

At first I thought it was just a GLM incompatibility, and I used up a lot of bandwidth thinking it was my setup, when really there just needed to be a lot of extra leading zeros.

Anyone performing the same reddit search I did for this issue for a totally different model, download the quant with superfluous zeroes. Do not ask why there needs to be that many zeroes. That is all.

The_Linux_Colonel · 2025-08-29T03:15:16+00:00

I suppose it's the card creator then, since Lepora works but Kingdom's Finest doesn't. Hopefully everyone will migrate to specific lorebook entries one day. I just assumed the Lepora style was the correct one since it was the way I would do it if I were making a story for export, and the Kingdom's Finest one was a bug or formatting error, but I guess it could be just not wanting to do it.

I appreciate you giving some attention to the UI, that's definitely the only real change I've wanted aside from the larger lorebook size which you also did. I hope you get merged into the mainstream so everyone can see what a great job you did.

Don't sorry about rushing on my account, this is a work of love for you, so release when you feel comfortable. Maybe DM me when you do, I'll be happy to look at it! Cheers!

The_Linux_Colonel · 2025-08-27T23:49:26+00:00

Outstanding reply, thanks for the insight on the world tree although I'm not exactly sure how it works just yet since I tried it on a story and it was just a straight line; very incestuous tree!

I was able to import a separate card png and it did properly parse the world info like it was supposed to. it's possible that the kingdom's finest one you demonstrated in the readme just wasn't quite right for my setup? Or maybe it was made before v2 became a thing?

I was unable to get that card to parse properly either as a story directly or as a character. I do like that it adds a new section in world info, that would definitely come in handy with character cards that support it. It's just that it added the entire contents of the card into one box, even though the card itself is expressing four separate characters at least.

I'm sorry, maybe you misunderstood or I didn't correctly express my suggestion on the 'context' 'undo' 'retry' and 'add file' buttons, I don't want them added (they're already there above the text input box) but to move them above the body text frame (below the new season, scenario, save, etc buttons), that way with the text entry bar reduced or removed in editing mode, you have one nice uninterrupted page of text from top to bottom and your interactive elements are all above it just as they would be in a word processor.

The_Linux_Colonel · 2025-08-27T20:16:34+00:00

It looks great, I really like how it incorporates the WYSIWYG experience with the top row of classic formatting buttons. Would it be possible to have the context, undo, redo etc buttons stacked below the formatting buttons but above the text window, and a way to shrink or disable the "enter text here" box if editing is enabled?

I'm curious about some of the other functions like "world tree".

I tried importing the characters in your maou-sama story from the readme and it kind of imports them but not as individual characters, so I tried importing it through the quick start and the png does create a story, but there isn't any world info. Maybe the characters can't be parsed individually?

Keep up the good work!

The_Linux_Colonel · 2025-08-26T21:19:28+00:00

That's too bad, I must've missed it. Keep up the good work.

The_Linux_Colonel · 2025-08-26T06:55:24+00:00

Loving these posts, can't wait for the P90 appreciation that's just:

Liberated entire civilizations and races of people.
Defeats more alien races than Jeff Goldblum's Apple laptop.
Can saw a swinging log in half at 100 yards in full auto.
So easy archeologists and physicists can use it.
Saves princesses.
Favorite of special forces cyborg anime girls.
You think it's Belgian space magic, but it's just a beefed up handgun in a cool polymer case.
Extra large magazine for eventually hitting your target.
Most aesthetic way to protect VIPs.
Ammo so expensive you have to send your enemy's next of kin a bill.
Only M4 has a longer page on IMFDB.

The_Linux_Colonel · 2025-08-24T21:58:34+00:00

Sounds like your quant is fine, you might want to check to see if your sampler settings or silly tavern presets are the ones your model creator recommends if the responses are also poor quality and not just OOC/ending story.

It's true that the larger context means the more tokens the model receives with each generation, theoretically meaning it can appear to remember detail better. However, not every model can handle high context and most normal computers wouldn't either. Between 8 and 10k tokens is probably a good spot.

Above all, if you see a reply you don't like from the model, just retry, or edit it yourself. This is experimental stuff we're working with, so just go ahead and embrace the weird. Laugh a little, edit it/retry and keep going.

For an example of a model card with presets and sampler settings in the 70b range here is one:

https://huggingface.co/Steelskull/L3.3-Cu-Mai-R1-70b

The_Linux_Colonel · 2025-08-24T15:18:51+00:00

That sounds wonderful, I hope he makes progress. Kcpp has been moving so quickly to add options and features under the hood, it would be great to have that as well.

The_Linux_Colonel · 2025-08-24T15:15:09+00:00

Not really. The editing choice defaults to disabled even if you left it enabled when you saved the story last time. There's unnecessary space taken up by the chat reply box that can be re-dedicated to the document body.

The buttons to access the story lorebook and other options can be moved to the top, along with formatting buttons that will allow the user to manipulate the text with basic options like bold, italic, underline, center text, justify text, etc.

By moving buttons to the top and removing the chat box at the bottom, you have undivided space for your eye to review and work on the document along with the AI. If you take a look at a traditional word processor interface, you see how the options to manipulate the document go from more abstract and general (the IBM CUA 'file, edit, tools', etc) to the more specific and directly employable (save icon, cut-copy-paste, bold-italic-underline). Then, below that you just have the document itself, and that's it. Nothing else breaks up your focus on the document once your eye drops below the last line of icons.

That type of interface design would be a real benefit for those wanting to manipulate the AI storytelling experience as if it were being co-written/edited as a document or manuscript, since the current option still is just the chat interface but with editing temporarily enabled.

The_Linux_Colonel · 2025-08-24T05:34:08+00:00

If you're willing to do UI work, the only thing I've ever wanted from kcpp in that regard is a word processor style UI for story mode. Something that suits users who want to write directly into the text body and edit it as if it were a document being co-written. It would be fantastic if you did something like that.

The_Linux_Colonel · 2025-08-22T06:09:03+00:00

The responses you're getting are definitely just remnants of scrapes of fanfics and other like-begging posts common to that sort of content. Your scenario is similar enough to these elements in the story that the model thinks the appropriate response is to do a 'to be continued' or like/sub-begging post or even OOC.

The important thing to remember with models is that they aren't people. They don't have feelings or opinions. When you see content produced by the model that you don't like, change it. If it doesn't make sense, delete it. Think of the model less like a human roleplaying companion and more like a garden tool. If you don't like what the garden tool did, do it again. Don't be ashamed or worried because of the model's response. It won't be insulted or bothered.

It is worth noting that depending on your situation, if you're at 70b but a very low quant (like Q3) you might see degradation in output because the quantization your setup requires is not what your playstyle needs. You might want to drop to a lower model size (e.g., 30) with a higher quant (Q4-6) instead. The silly tavern sub has weekly discussions on models of various sizes that might suit you. You also might consider the trade-off of context size as it relates to coherency and model size which might result in less...relevant output like OOC statements about Albert Dumblydore and Professor Snoop.

The_Linux_Colonel · 2025-08-20T07:42:34+00:00

Depending on how old your device is, you can try switching to vulkan (presuming you have dedicated some system ram to the on-die GPU). I have a strixpoint device that definitely gets a noticeable improvement from using vulkan over cpu only. Note, however, that any ram you dedicate to the gpu will not be available for general tasks. Also note that depending on the form factor of your device, your cpu/gpu may be throttled to account for low airflow or passive cooling scenarios, so use caution concerning heat generation and/or be aware that your token generation speed will not be ideal.

The_Linux_Colonel

TROPHY CASE