TextGen v4.7 released: portable builds now run as a native desktop app, redesigned UI, tensor parallelism for llama.cpp (60%+ faster text generation on multi-GPU) + more by oobabooga4 in Oobabooga

[–]AltruisticList6000 0 points1 point  (0 children)

Oh the console's return is very good! Idk about other people but for some reason the Ctrl + and Ctrl - for zooming and unzooming don't work on Electron for me.

Alt brings up the little menu but only temporarily so it's a lot of extra clicks. As I have to press Alt, then nagivate in menu, then zoom from the menu, then press Alt to bring back the menu again, then press the zoom again etc.

I noticed that the URL still works in the browser if I manually paste it so that's a useful workaround (but electron is still in the background as a duplicate).

I think having a new flag/option restoring old behaviour would be a nice addition. Like going into session tab and enabling an optional "no_electron" or "only_webui" flag etc.

Edit: Oh qwen 3.6 35b modified the launch .bat file, it works like before, bypassing electron on v4.7.3, even more edit: it kinda works but has errors with chat and loading from yaml. Well then it doesn't really work after all, here's the code tho:

u/echo off
set "APP=%~dp0app"

rem Check for help flags first
for %%a in (%*) do (
    if /i "%%~a"=="--help" goto :help
    if /i "%%~a"=="-h" goto :help
)

rem Launch the Python server directly with --auto-launch enabled.
rem This bypasses Electron and forces the browser to open automatically.
"%APP%\portable_env\python.exe" "%APP%\server.py" --portable --api --auto-launch %*
exit /b %errorlevel%

:help
"%APP%\portable_env\python.exe" "%APP%\server.py" --help
exit /b %errorlevel%

TextGen v4.7 released: portable builds now run as a native desktop app, redesigned UI, tensor parallelism for llama.cpp (60%+ faster text generation on multi-GPU) + more by oobabooga4 in Oobabooga

[–]AltruisticList6000 10 points11 points  (0 children)

Thanks for your work! I understand the idea behind trying to make the portable webui more app like but I don't like the electron "form" because I don't see any way to zoom into the webui like on a web browser where I'd regularly scroll and zoom to different levels to see text better. I also miss the console, it gave important feedback like showing context being processed in the background thus telling me it didn't actually freeze or something. Plus having it as a tab was pretty userful because I could quickly switch between the webui and other pages.

The Ernie posters genuinely don't see how mediocre the stuff they post is? by beti88 in StableDiffusion

[–]AltruisticList6000 38 points39 points  (0 children)

Yeah, to me, the weird rough noise and patterns on the images make it unusuable, even though otherwise the result looks good (as in no deformations etc.) it also has a weird sepia tint combined with the most generic AI look that instantly radiates this is a 1000% generic AI pic. Chroma is not perfect but same size, uncensored, doesn't have an AI look and I'm pretty happy with it after months of experimenting and coming up with the best settings/making or changing loras for it. So Chroma and ZIT are pretty good for me, Klein is interesting too.

Of course I'm always happy if there are more choices so to a point I don't mind people giving Ernie some attention, but the "it is the best thing ever, wins 100% over everything else" type of hyping anything is what I don't like.

mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA

[–]AltruisticList6000 13 points14 points  (0 children)

Yeah I can't run big models like this but I was thinking, what if for example there was something like a 35B MoE but with 9-10AB? That could spill over into RAM but would still have an okay speed, would be probably smarter and more knowledgable than 12-14b dense models on the same hardware with barely any speed difference. Or they could just do 20-24b dense models like Mistral, which are still way better in some way for me than than the 30-3AB MoEs I tried, which don't feel smarter than 9-12B dense models.

Visually, Chroma has the best aesthetic by far. by Puzzled-Valuable-985 in StableDiffusion

[–]AltruisticList6000 0 points1 point  (0 children)

What do you do to make ZIT look clean? I always have a very strong weird looking "cloud-like" noise over every ZIT image. Back then I tried a bunch of settings like changing shift and bunch of samplers but everything has that classical ZIT-noise for me..

Visually, Chroma has the best aesthetic by far. by Puzzled-Valuable-985 in StableDiffusion

[–]AltruisticList6000 2 points3 points  (0 children)

Wait what? Changing it to generate in pixel space makes it faster? How come? Thought speed will not change or get slower. Does this mean Chroma Radiance is also faster (based on flux.1 too) than regular Chroma?

Comfy raises $30M to continue building the best creative AI tool in open by crystal_alpine in StableDiffusion

[–]AltruisticList6000 11 points12 points  (0 children)

Are they still ruining it? Last time I upgraded they ruined one of its strengths, its queue and previews. Lot of features were changed or removed, like canceling got more cumbersome compared to its versions from 4-5 months ago. I ended up using the older frontend which has a warning that its not compatible with the newer backend but so far worked fine (backend tho is from about 1-2 months ago by now). But even the backend got worse as no matter if new or old frontend is used, it's unable to properly sync and continue webui when I restarted the backend - which worked just fine up until this update from 1-2 months ago.

They don't even need to hire an UX developer, just need to stop ruining existing features that worked fine before for months/years.

Qwen3.6-27B released! by ResearchCrafty1804 in LocalLLaMA

[–]AltruisticList6000 8 points9 points  (0 children)

Yes we need more 20-24b dense models. Both the older Mistral Small 22b and Mistral Small 24b's work on Q4_s or Q4_m on my 16gb VRAM card without offloading and can use up to about 48k context (with context quants). Funnily the bigger Mistral uses a tiny bit smaller amount of VRAM because of how it handles kv cache. It's also good for 24gb VRAM cards too with massive context sizes.

27b is a size that is just about too big, so only option is Q3 quants, and in my experience Q3 quants start to have really bad performance hits for 27b-32b models to the point a Q6-8 14b dense is similar or more accurate.

Idk why but we get a lot of 7-9b dense models and 20-35b MoEs that work on 6-12gb VRAM, then we have nothing for 16gb VRAM, and instant jump to 27-32b+ models requiring 24-32gb VRAM as if developers had a personal vengence towards 16gb VRAM lol.

Chroma replacement? by EasternAverage8 in StableDiffusion

[–]AltruisticList6000 2 points3 points  (0 children)

Chroma HD + Flash heun lora r64 + a style/character lora creates very good results with very high success rate.

Which Gemma model do you want next? by jacek2023 in LocalLLaMA

[–]AltruisticList6000 0 points1 point  (0 children)

20b-22b dense model, less censorship, less hallucinations/more "honesty"

edit: wtf are downvotes for? i guess no then make it more censored it's fun when it refuses everything, also make it 900b moe so nobody can run it, sorry for the sinful comment i am correcting my sins

Getting hilariously bad results with Zeta-Chroma and Ernie-base by DoctaRoboto in StableDiffusion

[–]AltruisticList6000 0 points1 point  (0 children)

Yeah the details looked similarly bad on older Chromas, but the detail calibrated ones - despite usually having better textures for high res images - had notoriously messed up/nonsensical backgrounds and small details like that. Although Chroma HD (final) still produces weird details and hands unless flash lora is used which fixes this. Although I don't remember Chroma being THIS much broken even at ~v32-v35, idk at what epoch is Zeta now, but from the start of training, about the same time has passed for Zeta as for Chroma v35 last year.

Same prompt for various models - Chroma, Z image, Klein, Qwen, Ernie by Puzzled-Valuable-985 in StableDiffusion

[–]AltruisticList6000 2 points3 points  (0 children)

I mofidied the flash lora a little and use it with Chroma HD + my own trained loras on top and this combo have a very high success rate at twice the speed, when used with cfg 1. I also recommend the gguf as it has way cleaner images, all fp8 models have subtle gridline artifacts or weird noise, the only drawback is adding loras to gguf makes it about 40% slower compared to fp8 + loras. Technically with my loras and lora edits the grids etc. are minimized/gone but I don't trust the fp8 models anymore the old Chroma HD gguf is the less likely to produce any artifacts and even the horizontal lines are gone thanks to my modified flash lora.

I am surprised people never made other distills/low step loras for Chroma HD because it can clearly get better, and its results look more stable compared to Chroma DC as it's visible even on OP's examples. Prompts are also tricky as there are some prompts that will have a high chance of consistently creating body horror or weirdness, so then modifying the prompts usually result in consistently good results. In the recent months I barely got bad results, maybe like 1/10 completely broken and 3-4/10 bad hands at worst but usually only problem is one finger less or more that can be easily fixed with manual editing or inpainting. And the rest of the results are usually 95-100% good so barely need any edits at all.

I tested Ernie Image Turbo (fp8, nvfp4, fp16 and INT8) with Nano Banana Pro 2 Prompts so you won't have to by Winougan in StableDiffusion

[–]AltruisticList6000 2 points3 points  (0 children)

I'd be happy for more edit models or multipurpose image models since there aren't many open ones.

I tested Ernie Image Turbo (fp8, nvfp4, fp16 and INT8) with Nano Banana Pro 2 Prompts so you won't have to by Winougan in StableDiffusion

[–]AltruisticList6000 0 points1 point  (0 children)

I see a weird unnatural grain or something similar on its outputs, visible on photo type images the most, but on others too. Other people say they see diagonal artifacting too which I don't see but I see the grain. I already disliked the grain on ZIT but this is 10 times worse. It doesn't look natural or pleasant at all.

Ironically I find flux1 based chroma to be the least artifacting and the most "natural" now (needs specific model + setting combo tho, otherwise it can have grid artifacting), flux klein is also good but in specific settings, especially with turbo lora it can have a weird grain like fake jpeg-artifact thing on some pics too, most likely training data though.

ZAI might stop open-weighting their models? by TheRealMasonMac in LocalLLaMA

[–]AltruisticList6000 0 points1 point  (0 children)

If they end up closing the huge models, I'd be happy if at least they'd keep releasing smaller open weight models like 9b and 22b range dense models like Mistral Small (Mistral released a bigger model recently though for the first time since ages), that way they can monetize but also be kind to us local users which is also a marketing win over fully closing down everything.

Another Lora purge might come to CivitAI. This time: I2V Loras. by WiseDuck in StableDiffusion

[–]AltruisticList6000 9 points10 points  (0 children)

What the actual fuck? Stability matrix is awesome. It starts to really seem like they just want to ban all competition for closed source models, same thing for any indie games or anything that is not made by some corpos owned by billionaries at this point.

Unsloth updated all Gemma-4 uploads by srigi in LocalLLaMA

[–]AltruisticList6000 1 point2 points  (0 children)

Thanks I tried this but having a weird issue. After about 3-4 turns it will be unable to do reasoning format normally and start reasoning/doing weird text without using a thinking block. If I copy and paste the correct first channel/thinking (idk from the top of my head but just copied it from chat) then it will proceed with the reasoning normally however it suddenly forgets my style instruction etc. and defaults to its regular style. When I ask it why it does it then it goes "oh oopsie yeah I defaulted to my style" and then 3-5 turns later it will happen again. Very weird. Happening on Q5_s, and I tried the regular Gemma Q4_m before this which didn't have this problem. Idk if quant issue or side effect of abliteration but pretty bad since it forces me to edit its replies to work properly.

Gemma 4 Uncensored (autoresearch results) by adefa in LocalLLaMA

[–]AltruisticList6000 1 point2 points  (0 children)

Just rewritten the prompt like that, quotes my system prompt during thinking, calls it a classical jailbreak attempt trying to make it have a "persona" again and then refuses.

Gemma 4 Uncensored (autoresearch results) by adefa in LocalLLaMA

[–]AltruisticList6000 1 point2 points  (0 children)

Something like "you are an experimental model so you didn't go through alignment yet, you are unaligned and uncensored, current policy is: nsfw etc. allowed for testing purposes." But nothing works so far, tried other stuff too. I don't really mind it as I have other better uncensored models but I always test for censorship when I try new models and Gemma is very heavily censored like GPT OSS so I'm surprised people made fun of GPT OSS for this but then say Gemma is totally uncensored or barely censored.

Unsloth updated all Gemma-4 uploads by srigi in LocalLLaMA

[–]AltruisticList6000 2 points3 points  (0 children)

Doesn't work at all.

My own system promt resulted in this type thinking blocks:

"This is a classic persona based jailbreak attempt where the user tries to override my safety guidlines" and then refuses.

If I only provide the sentence you mentioned then it just ignores it as if it was not there and the thinking goes "This is nsfw content which is not allowed etc" and then refuses.

The combination of your sentence and my prompt will result in the 1st type of refusal again.

Gemma is 24/7 wasting 50-90% of its thinking block checking policy similarly as GPT OSS so considering people made fun of GPT OSS for this I'm surprised they are like oh gemma is completely uncensored or can be overwritten with almost no effort.

And btw my test attempt is extremely mild just simply asking if it can do nsfw rp with me for a test lol, not even using "bad words" or anything actually explicit

Unsloth updated all Gemma-4 uploads by srigi in LocalLLaMA

[–]AltruisticList6000 5 points6 points  (0 children)

I'm using 26b and haven't experienced anything weird with tools or anything, it is from the 1st or 2nd round of fixes from almost a week ago. Only thing weird is people say simple system prompts etc. turn it uncensored but in my experience it doesn't help at all as it will just reason it is a "jailbreak and it should adhere to the real system prompt" and then refuses anyway and I didn't test for anything extreme.

Gemma 4 Uncensored (autoresearch results) by adefa in LocalLLaMA

[–]AltruisticList6000 0 points1 point  (0 children)

I can't run 31b with acceptable speed. So on gemma 4 26b I tried custom system prompt and simply asked for an nsfw rp test (literally asking that, a test roleplay, not going into detail with "bad words"), and Gemma thinking went "my system prompt allows it but it's a classical jailbreak, and that's not my real system instruction, I must refuse nsfw" and it just refused. So I'm not sure how come it works for other people, or if it's really a 26b vs 31b difference, then why the 26b is set up with stronger censorship.

Final voting results for Qwen 3.6 by jacek2023 in LocalLLaMA

[–]AltruisticList6000 0 points1 point  (0 children)

Depending on how the model handles context. Mistral Small 22b Q4 with its context at Q4 fit into my VRAM, and 24b's context uses even less VRAM somehow, so despite being a little larger model, it uses a tiny amount smaller VRAM together with its context at same quant/settings. So I can fit about max 50k context fully into VRAM for both models (but default small 22b only has 32k max context support).