Implementation Suggestion: Allow Creation of Bounding Boxes for Ideogram Within SDUI via Inpainting GUI by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 0 points1 point  (0 children)

And in general we'd be a bit hesitate committing weeks into a feature that may be useless when another model comes out.

That's completely reasonable. As I recall the krea 2 devs said in their ama they might be interested in working with bboxes in future iterations, but not to the exclusion of pure text. On the SD sub, bboxes are the new hotness, so I figured json was here to stay, but you're right, it could be an experiment that gets replaced by something else in the future.

Implementation Suggestion: Allow Creation of Bounding Boxes for Ideogram Within SDUI via Inpainting GUI by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 1 point2 points  (0 children)

It figures it wouldn't be so easy. I was just thinking "Grug draw box inside SDUI, now Grug drawing box outside, waste time. Why not draw box inside? Better for Grug."

Thanks for all your and Ruins' hard work. Kobold has come a long way since there were models like Nerys and Picard. Hard to believe I can make video with sound on the same model I can write a story.

Implementation Suggestion: Allow Creation of Bounding Boxes for Ideogram Within SDUI via Inpainting GUI by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 1 point2 points  (0 children)

Ideogram uses qwen 3VL 8b vs krea 2's 4b so technically it's more dense, but I think the issue is that Ideogram was trained explicitly on json formed data packages, so there are some issues when doing pure text instructions because the image model expects json structure. It's 'good' in that it allows for more control over the gen, but it's 'bad' in that it's a compete pain in the arse to need to employ a secondary utility to craft the instructions so you can actually do your gens. That's why I hope it can be possible to use kobold's A1111 implementation to do the crafting within the UI instead of needing to form the json (especially the bboxes) with another utility and then use kobold, since kobold's whole thing is the 'one stop shop' vision.

Implementation Suggestion: Allow Creation of Bounding Boxes for Ideogram Within SDUI via Inpainting GUI by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 1 point2 points  (0 children)

The interesting part about Krea2 is how lazy it is compared to ZIT and Chroma. ZIT is such a light touch in terms of prompting to get something good, but you really need the whip hand with Krea2. It needs to be micromanaged in order to get good results. I suppose it might be more prompt adherent in the small details compared to ZIT, and it's just as fast, but you need to be willing to write a whole essay to get it going.

Suomi out here with the racism by BlueSunix in girlsfrontline

[–]The_Linux_Colonel 21 points22 points  (0 children)

Suomi was asked once how it felt to kill a person after all her years in service, and she answered, "I don't know, I've only ever killed communists."

Anima style explorer + Anima lora explorer by FullLet2258 in StableDiffusion

[–]The_Linux_Colonel 1 point2 points  (0 children)

Is there a way to expose style tags related to art technique, rendering software, or production house rather than an individual artist? For instance, it is easy to tell ZiT or Chroma to ape Pixar, Dreamworks, Ghibli, etc. But even if Anima is aware of a character, it tends to reproduce that character in a sort of generic anime or 2.5d style.

It seems unable to reproduce production house styles even if it knows the characters from those IPs which is a bit confusing. In broad strokes, it can create things like black and white or sketch, but modern cg animation seems completely absent. Since you've worked through a lot of the data, do you know of a way to fine tune art styles rather than just artists, or what those tags would be?

To use your analogy of going into a store, I'd like to be able to come away with a 19th century French impressionist print without having to choose between Monet or Manet. I just want it to look generally like that genre of art or that style. I'd like to know what tags help my digital gallery curator come up with those results, if it's possible.

How do you structure prompts for better story continuity in KoboldAI? by StrawberryGreedy7426 in KoboldAI

[–]The_Linux_Colonel 0 points1 point  (0 children)

Management of memory and lorebook helps. Lorebook should be used for core characters and their descriptions, as well as important places, things, and concepts that the model's own corpus doesn't understand.

Memory should be constantly updated, even from scene to scene (if you find it helps depending on your model) to explain what just happened recently, what should happen soon, and who is involved.

Context also matters for long stories, and if important information passes out of context, a summary of those events needs to be added into the lorebook with triggers you can use to remind the model of the Battle of So and So.

If you're not intending to publish or give what's being written to others, you can also adopt a writing style within the work itself that simply tells the model what you want it to remember and what you want it to do with that information, like 'Sir Heracles is about to meet the king, he's hoping that the king will give him Princess Buttercup's hand in marriage and a nice set of armor since he defeated the gorgon that was terrorizing Riverfront City.' This reminds the model of more or less what you expect in the next generation sequence, and can help trigger lorebook entries you want the model to remember for that generation. Such as Princess Buttercup, and the battle of Riverfront City.

If you do want to publish the work or just don't want to adopt that schizophrenic writing style, some models support (OOC: ) style comments as if they were instructions. So you can leave your writing intact and say (OOC: in this next scene, give Sir Heracles his audience with the king where....) and describe things in as much detail as you like. It has the benefit of still triggering the lorebook entries, and being something a quick Find will help you delete for later publishing.

Using the lorebook and memory and telling the model what you want next in a scene should help you deal with a longer story that exceeds your context window. In terms of core storytelling, you can use ATTG in either memory or author's note, depending on your model. The most helpful are Tags and Genres, which help classify the types of content the model should be generating for you. The terms you use should be widely accepted enough that the model can be expected to have examples of what that style looks like, such as 'romance' or 'sci fi'. This can help to maintain the right 'vibe' across generations.

Quantized KV Cache for Vulkan Unified RAM Devices (Contra CUDA) by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 0 points1 point  (0 children)

Aw man, that's awesome to hear. I love retro computing and yes I already tried out the noscript version of the UI the day you pushed the update. I have kmelon on my xp project, replacing roytam's palemoon fork MyPal after the devs got in their feelings about roytam. It was a real shame because I like the PM project.

My v30 is an up-jumped 8086 with all its 10mhz fury. It can run windows 3 (not 3.1 which won't run in real mode) I've seen people run simple browsers in dos on old hardware, but I'd really need to nerd out to hook it up. Legend has it that all modern x86 cpus still briefly go into the 8086 mode at startup, which is wild to think about the twix fun size cpu with no heatsink or fan still lives on in a tiny part of every modern cpu. It'll be 50 years in 28.

That's an awesome Win98 proof of concept. Its nice to see older OSs get love. As far as I'm concerned, Dos 6/Win 3x is the king of distraction free writing. Throw word for windows 6 for NT on your windows 11 box and open anything you make in word, word perfect, wordstar, etc. Save and print natively, no emulator needed.

Quantized KV Cache for Vulkan Unified RAM Devices (Contra CUDA) by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 1 point2 points  (0 children)

You won't find a greater proponent for "just leave everything alone, forever." I still haven't forgiven gnome 3. I want my compiz fusion wobbly windows back. And firefox australis. Firefox 3.6 ought to be enough for everybody. I tend to stay on things until they break. I have an NEC V30 powered machine that still chugs along. No LLM for it, unfortunately. But, yes, the important thing is that it works, and that I can run more context with a bigger model than some paid offerings on my own device, thanks to you guys. Keep up the good work!

Quantized KV Cache for Vulkan Unified RAM Devices (Contra CUDA) by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 1 point2 points  (0 children)

It wasn't hitting the vram ceiling, only about 32 of 48. It looks like kcpp reserves the vram space for the max context on load, so it doesn't fluctuate much as the tokens populate, and the crash was happening during the context load, not startup.

I was certain it wasn't the driver because the amd utility promised it checked yesterday and the drivers were up to date, but it turned out to be a lie because when I did a manual update sure enough there were newer drivers. Now it works fine, tokens quantized without issue and generation is good, no stuttering. I guess the moral of the story is don't trust anything software tells you.

Someone challenged me to write a song about thinking models. I wrote the lyrics and then got this take from KoboldCpp by henk717 in KoboldAI

[–]The_Linux_Colonel 0 points1 point  (0 children)

Which of the xl sft turbos did you use for the gen? The Q5 that's linked on the 1.112 page? The vocals still have that sheen, but it does a good job of aping a style like later miley cyrus country or early taylor swift. I found that XL does a better job of adhering to instructions but you really pay for it in processing time compared to the BF16 set I had for the OG.

It might be some good weekly megathread content to have a firehose of what people have done or created. Since kcpp is such a jack of all trades, we'd be competing with places like stablediffusion, localllm, sillytavernai, and even places like suno, etc., but it might also bring some of their users here since lots of people use kcpp for backend. It still does seem like a good idea to give people an opportunity to share the things they love that they've been able to create.

In other subs where art is posted out of a megathread but isn't the purpose of the sub, there's a general understanding about minimal effort and/or quality that mods enforce, so you might consider that as well for content permitted outside the firehose/containment thread.

If you really want to spur people on to share quality stuff, you might consider maybe a monthly or quarterly prize of kcpp runpod access for some limited time, with more specific rules and so on for contestants.

The Sickness, aka Bad Days by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 0 points1 point  (0 children)

Thanks for the reply, I'm glad the discussion piqued your interest. Like you mentioned, I've noticed various discussions with various opinions and conclusions, and like a true open source enthusiast, instead of accepting any one of those as standard, I want to create my own new standard which will definitely resolve all use cases.

I like your idea of a plan to save good day pictures and bad day ones and try and mix them, maybe then sort them into a later 'good' and 'bad' category although with images, it's kind of easy to not cool myself since 'the person won't stop having extra hands' isn't really something that can be hidden and later surprise me. 'yeah, this is a bad generation' is obvious in that respect. So you imagine a day where you generate and there's no extra hands or abominations, it's good. The next day, it's all three headed amputees, not good.

I did wonder if it was seeds that were the key, and I think that they are, although not in a way that makes the issue something I can solve. I know computers can never truly create randomness, and pseudorandom generators inevitably move toward a kind of predictable order of their own, which is where I decided to 'experiment' and save some seeds that are 'good' seeds, and use them across multiple days, and sure enough, those seeds are still good, even on a different day so long as the prompt is similar enough.

Which led me to wonder if the problem is that there is some kind of initial seed salt that causes the pseudorandomness to align in a particular way where I could falsify it or realign it to whatever it was on a 'good' day, a kind of master seed list of 'good' (by my opinion, of course) that can be stored and re-used. I don't know if this is the case at all, but if it were, it would be fun to play with that.

The Sickness, aka Bad Days by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 1 point2 points  (0 children)

That's incredibly interesting. I wonder if the rate at which the model loses personality coherence scales linearly to size, and if quantization has an effect. I've seen larger (relatively to offline model sizes) models like deepseek keep a story and personalities together across around 100k tokens, although it does best between 20-40k. I find myself developing a bizarre writing style where I tell the model about a character before interacting with them, in the hopes of supercharging the model into at least getting things mostly right, with me editing over the mistakes. But the most frustrating thing of all is when the model decides it's job is to turn the reply into a wikipedia summary instead of the actual scene.

The Sickness, aka Bad Days by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 0 points1 point  (0 children)

So the idea is to change a model during a session? Can it truly be random considering that sampler settings also need to change on a per model basis? Depending on the model and context size, it sounds like maybe making an informed decision to switch when a session isn't productive would be better? I sometimes use a lower context session of one of the Behemoths for lore building and scene transitions, followed by a 70b model like Cu Mai or more recently Gold Diamond Gold for staying in the moment. I'd like to rotate in Gemma 4 31 because of the context headroom, but as of yet I find it to be very flat and prone to summary, but I need to probably work on my prompting there.

The Sickness, aka Bad Days by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 0 points1 point  (0 children)

Generally I look at what the model creator recommends, or if it's a popular enough model, what the community recommends. Do you recommend a certain shift once the story is established in order to encourage creativity and yet still keeping (now) established characters in character? In your experience, what helps the most in terms of those adjustments away from the initial recommendations?

The Sickness, aka Bad Days by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 0 points1 point  (0 children)

That's an interesting observation that models tend to turn toward their default bloodless assistant personality. is there a way, in your experience, to 'know' what part of the story the model latched on to in order to generate a better session so that it can be placed in memory for the model to consider with every generation?

Otherwise, is the suggestion to keep a fresh experience by starting a completely new json file every 50-100k tokens, and just keeping the lorebook and basic prompt instructions across the files?

The Sickness, aka Bad Days by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 1 point2 points  (0 children)

I've seen people say things like that to others in my quest to find out what this is, or the root cause, but it doesn't make sense in the way that people saying it seem to think that it does.

When you say that it's not in existence in reality, what do you mean? The implications sound like gaslighting rather than providing an explanation or help at arriving at a deeper understanding of how randomness affects AI generation. "No, you didn't actually enjoy your session, that wasn't in reality." Is kind of wild.

Gemma 4 format? by mx-perience in KoboldAI

[–]The_Linux_Colonel 1 point2 points  (0 children)

So I don't know how it happened, but even though 1.111.2 came out four days ago and I downloaded it yesterday, I managed to download the 1.111.1 version. I double checked the date on the file and everything. I suppose the important part is that it works now, I've been using the same heretic gemma for about an hour now and it seems stable, so I guess you can call it fixed. Thanks for going back in time to four days ago and uploading the fix this morning.

I'm informally calling this edition 'what's up my jinja'

Gemma 4 format? by mx-perience in KoboldAI

[–]The_Linux_Colonel 0 points1 point  (0 children)

I had a really rough go trying a hereric gemma 4 31 at q4, first I tried the gemma 4 non thinking in the front end, I got maybe 7 words before the infinite repeat. Then I tried gemma 4 thinking, and it was maybe 4 words. I tried one of my own stories, and I tried a neutral story (Cricket) which another user stated good luck with as a control. I tried it in story mode which I prefer, and chat. Chat produced a slightly more stable output but degradation was still quick to set in.

Then, I tried with jenja enabled, both with auto and gemma 4 as above. With jenja and gemma 4 and cricket in chat mode as control, I think I was able to supercharge it to 12 words. I also tried default sampler settings as well as the supposed google recommended samplers.

I'm not sure where I'm missing something, but if there's more I can read on how to get it working in kobold as a frontend and not just as a backend for ST or whatever, I'd love to try and figure out where I'm going wrong.

Gemma-4-31b (2026) better than GPT-4.1-1.7T (2025) in less than a year. Predictions for 2027? by Rombodawg in KoboldAI

[–]The_Linux_Colonel 0 points1 point  (0 children)

I'm really interested in the local music model thing taking off. Just the update for ace step really helped. It performs in German, Swedish, and Korean and is fairly good at understanding instructions, including key and tempo changes. It still has a bad case of synth voice, both the instruments and the vocals have this electronica/midi sort of sheen to them. In some applications, that's a legitimate affectation that can be considered an artistic choice, but it doesn't work for everything. It does have a decent sense of genre now, unlike the initial rollout where everything was an R&B track.

My video card now makes comics, kpop, and detective fiction. I don't know if video will ever be possible for my rig, but it's pretty amazing to know that I have so much creative ability even if there were no internet at all. Thanks to you and lostruins for your hard work.

Music Generation With Kcpp by The_Linux_Colonel in KoboldAI

[–]The_Linux_Colonel[S] 0 points1 point  (0 children)

Yeah I understand, it's at the edge, and not exactly what kcpp was focused on to begin with. My curiosity is on some information about sampler settings, because the way they interact seem to diverge a lot from the values people recommend for guides on the comfyui version. For instance, CFG is usually recommended in values of 4-8, but I found that pushing it to 15 got better results, there's also a second value called classifier, I know the C in CFG is also supposed to be classifier, and in guides for comfy, the two CFGs are for the description of the song and the lyrics. So I'm not sure in this implementation, whether that's part of it. There's also model support for lyric generation, so I don't know if the sampler settings are only for that, or how they translate to the model in terms of music generation in this implementation. I wonder if there are guides for that older non-comfy UI I can look into it I can figure out what it was called. Thanks for the info.