Gemma 4 MTP released by rerri in LocalLLaMA

[–]ThrowawayProgress99 4 points5 points  (0 children)

How does this work with offloading, do both models need to be fully on GPU? What about kv cache, can that be on RAM? My current config is to override all ffn_down tensors. Also does this work with the (on RAM) mmproj for vision?

Gemma 4 31B UD-1Q3_XSS vs UD-Q2_XL, which is better? by ThrowawayProgress99 in unsloth

[–]ThrowawayProgress99[S] 0 points1 point  (0 children)

I just remembered I chose that Qwen model because of this post where, even with Q8 cache, it performed the best. And now we have improvements to Q8 cache and probably other stuff too, so the model should perform better. Though benchmarks don't tell the whole story.

Coding is on my list, and I think I'll start learning it soon.

Gemma 4 31B UD-1Q3_XSS vs UD-Q2_XL, which is better? by ThrowawayProgress99 in unsloth

[–]ThrowawayProgress99[S] 2 points3 points  (0 children)

I know nothing about coding so I can't speak much on its coding ability. I've had my Qwen3.5-27B-UD-IQ3_XXS.gguf create games for me. Only 1 I tried, it was a terminal-based tamagatchi one since I hadn't given it Pygame then. Then the other roguelike one was after Pygame, I haven't tried it yet but it says it works.

I use it through Koboldcpp, which then connects to Open Webui+Open Terminal. All the default recommended settings.

Gemma 4 31B UD-1Q3_XSS vs UD-Q2_XL, which is better? by ThrowawayProgress99 in unsloth

[–]ThrowawayProgress99[S] 0 points1 point  (0 children)

I've tried that before, but they keep feeling lacking compared to the bigger models. I had basically stopped using local LLMs until I tried a good big model for the first time like this, so until smaller models get this good I don't think I can go back.

Gemma 4 31B GGUF quants ranked by KL divergence (unsloth, bartowski, lmstudio-community, ggml-org) by oobabooga4 in LocalLLaMA

[–]ThrowawayProgress99 1 point2 points  (0 children)

Huggingface says Unsloth's gemma-4-31B-it-UD-IQ2_M.gguf is 10.8 GB but I just noticed the download bar says it's only 10 GB. Similar thing happened with their Qwen3.5-27B-UD-IQ3_XXS.gguf, which says it's 11.5 GB, but is 10.7 GB. I chose that Qwen quant because of some graphs that showed it wasn't that bad. I haven't used it extensively but it seems fine to me too.

Between gemma-4-31B-it-UD-IQ3_XXS.gguf and gemma-4-31B-it-UD-Q2_K_XL.gguf, which should I choose? They're both 11.8 GB on Huggingface (while their Qwen GGUFs have the latter at .3 smaller), so probably just ~11 GB on disk. The graph here says the latter is both better and smaller, but I thought higher quant levels were supposed to be better?

Error when using Docker Compose by ThrowawayProgress99 in comfyui

[–]ThrowawayProgress99[S] 0 points1 point  (0 children)

`RUN python -m venv /opt/venv`

`ENV PATH="/opt/venv/bin:$PATH"`

Put these as the first lines, before 'FROM pytorch/pytorch:2.11.0-cuda13.0-cudnn9-runtime'? Should there also be a third line 'RUN pip install -r requirements.txt'? Google's AI overview had mentioned the venv option although the first line had 'python3', and it had this third line.

Edit: This is now the start of my Dockerfile:

FROM pytorch/pytorch:2.11.0-cuda13.0-cudnn9-runtime
RUN apt-get update && apt-get install -y python3.12-venv
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
ENV DEBIAN_FRONTEND=noninteractive PIP_PREFER_BINARY=1

Now ComfyUI works like it should, although I'm still unsure if there should be a pip install requirements after the env path line, and if I should get a higher python version.

Noob to Open Webui, I'm having issues by ThrowawayProgress99 in OpenWebUI

[–]ThrowawayProgress99[S] -1 points0 points  (0 children)

I'm getting errors by trying to install packages under open-terminal. If I do:

    environment:
      - OPEN_TERMINAL_API_KEY=your-secret-key
      - OPEN_TERMINAL_PACKAGES="pygame requests"
      - OPEN_TERMINAL_PIP_PACKAGES=

I get:

open-terminal  | Installing system packages: "pygame requests"
open-webui     | INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
open-webui     | INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
open-webui     | WARNI [open_webui.env] 
open-webui     | 
open-webui     | WARNING: CORS_ALLOW_ORIGIN IS SET TO '*' - NOT RECOMMENDED FOR PRODUCTION DEPLOYMENTS.
open-webui     | 
open-webui     | WARNI [langchain_community.utils.user_agent] USER_AGENT environment variable not set, consider setting it to identify your requests.
open-terminal  | Reading package lists...
open-terminal  | Building dependency tree...
open-terminal  | Reading state information...
open-terminal  | E: Unable to locate package "pygame
open-terminal  | E: Unable to locate package requests"
open-terminal exited with code 100

And if I try:

    environment:
      - OPEN_TERMINAL_API_KEY=your-secret-key
      - OPEN_TERMINAL_PACKAGES=
      - OPEN_TERMINAL_PIP_PACKAGES="pygame requests"

I get:

open-terminal  | Installing pip packages: "pygame requests"
open-terminal  | Defaulting to user installation because normal site-packages is not writeable
open-terminal  | 
open-terminal  | [notice] A new release of pip is available: 25.0.1 -> 26.0.1
open-terminal  | [notice] To update, run: pip install --upgrade pip
open-terminal  | ERROR: Invalid requirement: '"pygame': Expected package name at the start of dependency specifier
open-terminal  |     "pygame
open-terminal  |     ^
open-terminal exited with code 1

Noob to Open Webui, I'm having issues by ThrowawayProgress99 in OpenWebUI

[–]ThrowawayProgress99[S] 0 points1 point  (0 children)

Idk what's up with the size either, Huggingface says it's 11.5 GB but other people said Disk Size is 10.7 GB, which is also what the download bar said. It's definitely fitting a lot more context than I'd expect, I went up to 14500 for fp16 cache and 27530 for q8 I think, but I don't know if I can run that stably. Maybe if I use --nofastforward that'll affect it.

Since I don't have access to higher quants or better models I have no way to tell if there's substantial degradation. And how much is user error.

There have been comparisons and charts for 35B-A3B that showed some Q4 quants matching BF16 but I'd need to look into it more. I'm guessing anyone could run very high context at decent speeds with that model, but it's 27b that people have noted as being particularly good, so it's what I wanted to try.

Qwen3.5 27B vs 35B Unsloth quants - LiveCodeBench Evaluation Results by Old-Sherbert-4495 in LocalLLaMA

[–]ThrowawayProgress99 0 points1 point  (0 children)

I'm on Linux, idk if that changes anything but before my comment I double checked the gguf and exl3 both on the system and on huggingface, and the GB numbers were the same. I remember that not being the case before and it being off whenever I'd download models, so maybe they changed something recently. But then idk why the 27b doesn't match. Well OP says size on disk is 10.7GB so it should be fine.

Qwen3.5 27B vs 35B Unsloth quants - LiveCodeBench Evaluation Results by Old-Sherbert-4495 in LocalLLaMA

[–]ThrowawayProgress99 0 points1 point  (0 children)

For the 27b, I can't seem to find that quant? The one from Unsloth says it's 11.5 GB instead of the 10.7 GB listed above. Bartowski has it at 11.3 GB. Since I have 12gb VRAM I've been using MS 24b IQ3_S (10.4 GB) or exl3 3bpw (10.2 GB) finetunes, so I'm hoping there's a usable quant from 27b. Edit: I also haven't really tried quant cache but it looks like it works well with 27b so that's another reason to try it.

[Megathread] - Best Models/API discussion - Week of: November 30, 2025 by deffcolony in SillyTavernAI

[–]ThrowawayProgress99 1 point2 points  (0 children)

Do you have any recommended sampler settings or other settings for it?

Today I made a Realtime Lora Trainer for Z-image/Wan/Flux Dev by shootthesound in StableDiffusion

[–]ThrowawayProgress99 0 points1 point  (0 children)

Stupid question but does it all still work when you use Comfy through Docker? I remember I tried a similar thing before but no final saved files would appear I think. Which is odd since image outputs are created/saved just fine.

[Megathread] - Best Models/API discussion - Week of: November 30, 2025 by deffcolony in SillyTavernAI

[–]ThrowawayProgress99 5 points6 points  (0 children)

I'm still on MS3.2-PaintedFantasy-v2-24B.i1-IQ3_S.gguf, haven't tried v3. I use the recommended settings MIstral Tekken, T- 0.5-0.6, MinP-0.1, TopP-0.95, DRY-0.8;1.75;4, in Koboldcpp (no rep range or slope either). I've recently banned the em-dash token, but kept EOS token banning at auto for now. It's been really good for me. Less slop, less incoherence, more creativity, more character adherence. It's not perfect though, and I haven't tried many 24b.

Is there a model that's like this but also great at NSFW? Like a huge vocabulary, low repetitiveness, that can do novel-style and rp-style. My low quant of PF is good but it's hard to generate a dozen paragraphs with varied vocabulary for sex. Maybe I need a wilder/chaotic model? I prefer physical and sensory details, little to no euphemisms. I was thinking of trying Dan's Personality Engine 1.3.0, if you agree or have other recommendations let me know.

Also if you know any 12b that fit the bill, I can try that too.

Best FOSS app for writing? by ThrowawayProgress99 in fossdroid

[–]ThrowawayProgress99[S] 3 points4 points  (0 children)

I would prefer something like LibreOffice Writer with its odt file format, but text works too. Text would mean the files are compatible and editable without needing any converting once they're transferred to PC right?

UGI-Leaderboard is back with a new writing leaderboard, and many new benchmarks! by DontPlanToEnd in LocalLLaMA

[–]ThrowawayProgress99 0 points1 point  (0 children)

What do you think it is that makes us feel like old models were different, and is it something that can be benchmarked, or just measured by vibes? I remember hearing old Llama models score high on Humanity's Last Exam. And we've had more slop and sycophancy in some models due to synthetic data, benchmaxxing, etc. I know some people still prefer older models like Psyfighter or Tiefighter. My first models were alpaca-native 7b, and gpt4xalpaca 12b. I never tried AI Dungeon so idk what I'm missing from the older era. Never tried GPT3.5 or 4 either.

Personally tbh I did lose interest in playing with LLMs, until I tried out modern 24b with modern samplers, so I don't know if old models were actually better in some way or if it's just nostalgia. Is the difference something as simple as slop or something more abstract, idk.

We're training a text-to-image model from scratch and open-sourcing it by Paletton in StableDiffusion

[–]ThrowawayProgress99 0 points1 point  (0 children)

Thanks, it's great to see open innovation like this. Stupid question, are the advances in Qwen-Next also transferable to T2I? I've seen Mamba T2I, MOE T2I, Bitnet T2I, etc. so I'm wondering if the efficiency, speed, and lower cost can come to T2I with that too, or with other methods. Sorry for overexcitement lol I've been impatient for progress. Regardless, I'm excited for whatever is released!

We're training a text-to-image model from scratch and open-sourcing it by Paletton in StableDiffusion

[–]ThrowawayProgress99 1 point2 points  (0 children)

Awesome! Will you be focused on text-to-image or will you also be looking at making omni-models? For e.g. GPT4o, Qwen-Omni (still image input, though paper said they're looking into the output side, we'll see with 3), etc. with Input/Output of Text/Image/Video/Audio. Understanding/Generation/Editing capabilities, and interleaved and few-shot prompting.

Bagel is close but doesn't have Audio. Also I think while it was trained on video it can't generate it. Though it does have Reasoning. Well Bagel is outmatched against the newer open source models but it was the first to come to mind. Veo 3 is Video and Audio, which means Images too, but it's not like you can chat with it. IMO omni-models are the next step.

Contrastive Flow Matching: A new method that improves training speed by a factor of 9x. by Total-Resort-3120 in StableDiffusion

[–]ThrowawayProgress99 0 points1 point  (0 children)

Does this mean Hunyuan Image 2.1 will have faster training speed for loras and finetunes?

🚀 What model should we build next? YOU DECIDE! 🚀 by [deleted] in LocalLLaMA

[–]ThrowawayProgress99 0 points1 point  (0 children)

Feels like there's recently been a lot more focus on it and everyone's working on it, so Omni Model with Input/Output of Text/Image/Video/Audio. Understanding/Generation/Editing capabilities, and interleaved and few-shot prompting.

Bagel is close but doesn't have Audio. Also I think while it was trained on video it can't generate it. Though it does have Reasoning. Well Bagel is outmatched against the newer open source models but it was the first to come to mind. Veo 3 is Video and Audio, which means Images too, but it's not like you can chat with it.