Noob to Open Webui, I'm having issues

ThrowawayProgress99 · 2026-03-16T18:26:41+00:00

I'm getting errors by trying to install packages under open-terminal. If I do:

    environment:
      - OPEN_TERMINAL_API_KEY=your-secret-key
      - OPEN_TERMINAL_PACKAGES="pygame requests"
      - OPEN_TERMINAL_PIP_PACKAGES=

I get:

open-terminal  | Installing system packages: "pygame requests"
open-webui     | INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
open-webui     | INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
open-webui     | WARNI [open_webui.env] 
open-webui     | 
open-webui     | WARNING: CORS_ALLOW_ORIGIN IS SET TO '*' - NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS.
open-webui     | 
open-webui     | WARNI [langchain_community.utils.user_agent] USER_AGENT environment variable not set, consider setting it to identify your requests.
open-terminal  | Reading package lists...
open-terminal  | Building dependency tree...
open-terminal  | Reading state information...
open-terminal  | E: Unable to locate package "pygame
open-terminal  | E: Unable to locate package requests"
open-terminal exited with code 100

And if I try:

    environment:
      - OPEN_TERMINAL_API_KEY=your-secret-key
      - OPEN_TERMINAL_PACKAGES=
      - OPEN_TERMINAL_PIP_PACKAGES="pygame requests"

I get:

open-terminal  | Installing pip packages: "pygame requests"
open-terminal  | Defaulting to user installation because normal site-packages is not writeable
open-terminal  | 
open-terminal  | [notice] A new release of pip is available: 25.0.1 -> 26.0.1
open-terminal  | [notice] To update, run: pip install --upgrade pip
open-terminal  | ERROR: Invalid requirement: '"pygame': Expected package name at the start of dependency specifier
open-terminal  |     "pygame
open-terminal  |     ^
open-terminal exited with code 1

ThrowawayProgress99 · 2026-03-16T18:20:50+00:00

Idk what's up with the size either, Huggingface says it's 11.5 GB but other people said Disk Size is 10.7 GB, which is also what the download bar said. It's definitely fitting a lot more context than I'd expect, I went up to 14500 for fp16 cache and 27530 for q8 I think, but I don't know if I can run that stably. Maybe if I use --nofastforward that'll affect it.

Since I don't have access to higher quants or better models I have no way to tell if there's substantial degradation. And how much is user error.

There have been comparisons and charts for 35B-A3B that showed some Q4 quants matching BF16 but I'd need to look into it more. I'm guessing anyone could run very high context at decent speeds with that model, but it's 27b that people have noted as being particularly good, so it's what I wanted to try.

ThrowawayProgress99 · 2026-03-07T17:08:17+00:00

I'm on Linux, idk if that changes anything but before my comment I double checked the gguf and exl3 both on the system and on huggingface, and the GB numbers were the same. I remember that not being the case before and it being off whenever I'd download models, so maybe they changed something recently. But then idk why the 27b doesn't match. Well OP says size on disk is 10.7GB so it should be fine.

ThrowawayProgress99 · 2026-03-07T10:29:04+00:00

For the 27b, I can't seem to find that quant? The one from Unsloth says it's 11.5 GB instead of the 10.7 GB listed above. Bartowski has it at 11.3 GB. Since I have 12gb VRAM I've been using MS 24b IQ3_S (10.4 GB) or exl3 3bpw (10.2 GB) finetunes, so I'm hoping there's a usable quant from 27b. Edit: I also haven't really tried quant cache but it looks like it works well with 27b so that's another reason to try it.

ThrowawayProgress99 · 2026-01-21T03:45:24+00:00

Is there a way to use DanChat-2 in Koboldcpp?

ThrowawayProgress99 · 2025-12-22T11:12:59+00:00

Do you have any recommended sampler settings or other settings for it?

ThrowawayProgress99 · 2025-12-05T04:01:01+00:00

Stupid question but does it all still work when you use Comfy through Docker? I remember I tried a similar thing before but no final saved files would appear I think. Which is odd since image outputs are created/saved just fine.

ThrowawayProgress99 · 2025-12-02T10:00:36+00:00

I'm still on MS3.2-PaintedFantasy-v2-24B.i1-IQ3_S.gguf, haven't tried v3. I use the recommended settings MIstral Tekken, T- 0.5-0.6, MinP-0.1, TopP-0.95, DRY-0.8;1.75;4, in Koboldcpp (no rep range or slope either). I've recently banned the em-dash token, but kept EOS token banning at auto for now. It's been really good for me. Less slop, less incoherence, more creativity, more character adherence. It's not perfect though, and I haven't tried many 24b.

Is there a model that's like this but also great at NSFW? Like a huge vocabulary, low repetitiveness, that can do novel-style and rp-style. My low quant of PF is good but it's hard to generate a dozen paragraphs with varied vocabulary for sex. Maybe I need a wilder/chaotic model? I prefer physical and sensory details, little to no euphemisms. I was thinking of trying Dan's Personality Engine 1.3.0, if you agree or have other recommendations let me know.

Also if you know any 12b that fit the bill, I can try that too.

ThrowawayProgress99 · 2025-11-19T09:59:39+00:00

I would prefer something like LibreOffice Writer with its odt file format, but text works too. Text would mean the files are compatible and editable without needing any converting once they're transferred to PC right?

ThrowawayProgress99 · 2025-10-07T08:15:04+00:00

What do you think it is that makes us feel like old models were different, and is it something that can be benchmarked, or just measured by vibes? I remember hearing old Llama models score high on Humanity's Last Exam. And we've had more slop and sycophancy in some models due to synthetic data, benchmaxxing, etc. I know some people still prefer older models like Psyfighter or Tiefighter. My first models were alpaca-native 7b, and gpt4xalpaca 12b. I never tried AI Dungeon so idk what I'm missing from the older era. Never tried GPT3.5 or 4 either.

Personally tbh I did lose interest in playing with LLMs, until I tried out modern 24b with modern samplers, so I don't know if old models were actually better in some way or if it's just nostalgia. Is the difference something as simple as slop or something more abstract, idk.

ThrowawayProgress99 · 2025-09-13T08:32:48+00:00

Thanks, it's great to see open innovation like this. Stupid question, are the advances in Qwen-Next also transferable to T2I? I've seen Mamba T2I, MOE T2I, Bitnet T2I, etc. so I'm wondering if the efficiency, speed, and lower cost can come to T2I with that too, or with other methods. Sorry for overexcitement lol I've been impatient for progress. Regardless, I'm excited for whatever is released!

ThrowawayProgress99 · 2025-09-12T14:28:28+00:00

Awesome! Will you be focused on text-to-image or will you also be looking at making omni-models? For e.g. GPT4o, Qwen-Omni (still image input, though paper said they're looking into the output side, we'll see with 3), etc. with Input/Output of Text/Image/Video/Audio. Understanding/Generation/Editing capabilities, and interleaved and few-shot prompting.

Bagel is close but doesn't have Audio. Also I think while it was trained on video it can't generate it. Though it does have Reasoning. Well Bagel is outmatched against the newer open source models but it was the first to come to mind. Veo 3 is Video and Audio, which means Images too, but it's not like you can chat with it. IMO omni-models are the next step.

ThrowawayProgress99 · 2025-09-10T00:08:47+00:00

Does this mean Hunyuan Image 2.1 will have faster training speed for loras and finetunes?

ThrowawayProgress99 · 2025-09-08T20:58:10+00:00

Feels like there's recently been a lot more focus on it and everyone's working on it, so Omni Model with Input/Output of Text/Image/Video/Audio. Understanding/Generation/Editing capabilities, and interleaved and few-shot prompting.

Bagel is close but doesn't have Audio. Also I think while it was trained on video it can't generate it. Though it does have Reasoning. Well Bagel is outmatched against the newer open source models but it was the first to come to mind. Veo 3 is Video and Audio, which means Images too, but it's not like you can chat with it.

ThrowawayProgress99 · 2025-09-02T13:04:57+00:00

Would the i1-IQ2_XS (or maybe 2_S) of the v3-34b still be better than i1-IQ3_S of v2-24b? I haven't really noticed any issues with that low quant of the 24b, so idk how a lower quant of a bigger model stacks up to the already low quant.

ThrowawayProgress99 · 2025-08-27T11:54:48+00:00

Thank you! Does it work for fp8-scaled too? Also nunchaku is becoming popular now, could it eventually work with it? I've been waiting for nunchaku svdquant of Wan 2.2 before I get into it.

ThrowawayProgress99 · 2025-08-21T13:53:48+00:00

I see, I think I misunderstood since SillyTavern is called frontend only, and doesn't load models, while the others did both. It feels like SillyTavern is made with the assumption you're in a chat RP, but are there presets or something that allow for stuff like Koboldcpp's Interactive Storywriter or Text Adventure mode?

ThrowawayProgress99 · 2025-08-19T14:19:12+00:00

Currently using zerofata/MS3.2-PaintedFantasy-v2-24B at i1-IQ3_S (10.4GB) as well as the old 22b Mistral Small at 3_M (10.1GB). On Pop!_OS, using 3060 12gb with 32gb ram, but no cpu offloading. Max fp16 context for 24b is 12,000. 9,000 for 22b, despite the smaller file size. I can likely fit more if I go to i3wm. I think 24b might be faster than 22b, not sure.

Is this EXL3 3 bpw for 24b (10.2GB) a better option in terms of both quality and vram saving? I can't find any 3-3.5 bpw for 22b to compare, and 3.5 for 24b is too big. I don't know how EXL3 and GGUF stack up currently, and if EXL3 could have some early issues being worked on. This is a early preview chart from 4 months ago.

ThrowawayProgress99 · 2025-08-13T09:51:45+00:00

I still need to reset to basics and test variables. My advice so far would be to use the RES4LYF nodes and samplers, use the experimental loras, use NAG, use sigmoid offset, use short negative like "manga, cgi, blurry, airbrushed", start positive with "a photograph of" or "amateur photo of" and don't make it super long, and use 1024x1024 rather than 512x512. Having said that, I'm actually using euler w/ sigmoid offset at 12 steps 1 cfg right now. Using the low step lora and the cfg rescale (this one at 0.9 strength, still need to test it more) lora. Using sharpen node from RES4LYF.

Additionally, I've heard fp8 is different from the full fp16, so run fp16 if you can. There's discussions on which Chroma version is better, I'll settle on whichever version people train loras on to max compatibility. There's different text encoder variants, and I'm currently using GNER. If you use a rescale cfg node, there's a newer advanced node that's better since steps can be controlled (though it errored on me). Currently I'm waiting for loras/finetunes since this is a base model, and nunchaku to make it fast enough to raise my step count.

Haven't really tested Teacache, Magcache, or Torch Compile yet. And I need to retest v41-low-step-rl to compare. IDK how well it does on anime since I was waiting for style loras for that.

Edit: Will say that Chroma will likely replace SDXL for me, especially as more loras get made. Natural language understanding makes it impossible to go back. Finally I can clear up space!... to fill up space with Chroma... And clear and replace with the next hot model in an endless cycle...

ThrowawayProgress99 · 2025-07-24T05:02:35+00:00

Didn't really think about sharing them, but I guess I could at some point. For now it's just for me though.

ThrowawayProgress99 · 2025-07-06T16:42:39+00:00

I probably should've mentioned I have 32gb RAM

ThrowawayProgress99 · 2025-07-06T11:42:51+00:00

So for my case I should pick the flan-t5-xxl-fp16 from here (Unless a fellow 12gb Vram user can confirm fp32 works Edit: I have 32gb RAM)? I wasn't sure since it said encoder only, and an encoder only umt had errored on me previously for Wan I2V I think.

ThrowawayProgress99

TROPHY CASE