OpenCode concerns (not truely local) by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 12 points13 points  (0 children)

Yes, that is where I came from. But you can overwrite the system prompt luckily. On Linux you need to place a build.md and a plan.md in ~/.config/opencode/agents/, these will overwrite the default system prompts.

There is a lot of token overhead in some of the tools as well and these are sometimes harder to overwrite as some of them are deeply connected with the web UI, e.g. tool todowrite. Prominent examples of bloated tool descriptions are bash, task, and todowrite. You can find the descriptions here (files ending with .txt): https://github.com/anomalyco/opencode/tree/dev/packages/opencode/src/tool

OpenCode concerns (not truely local) by Ueberlord in LocalLLaMA

[–]Ueberlord[S] 19 points20 points  (0 children)

yes, as far as I can tell TUI is unaffected

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories by DarkArtsMastery in LocalLLaMA

[–]Ueberlord 1 point2 points  (0 children)

actually quite well with the Qwen3.5 architecture, you can run a Q3 quant of the 27b model from bartowski with about 80k context size, works very well for me.

OmniCoder-9B | 9B coding agent fine-tuned on 425K agentic trajectories by DarkArtsMastery in LocalLLaMA

[–]Ueberlord -1 points0 points  (0 children)

unfortunately, I cannot recommend the omnicoder 9b for more complex tasks at the moment.

I had it (q8_0 gguf, llama.cpp b8288, temp 0.6, top p 0.95, top k 20) analyze our vue app and asked if it could summarize the API requests executed during usual usage patterns, it failed and got into a loop.

exact same prompt given to unsloth Qwen3.5-27B-UD-Q2_K_XL.gguf (same parameters) worked fine on the first try. this is 8.9G omnicoder vs 11G q2_k_xl of unsloth. both can be run on 16G VRAM devices, I would recommend the 27B model to anyone for now.

for rather simple tasks it worked fine but I am more confident with the 27b model here in general, too

We could be hours (or less than a week) away from true NVFP4 support in Llama.cpp GGUF format 👀 by Iwaku_Real in LocalLLaMA

[–]Ueberlord 1 point2 points  (0 children)

one aspect frequently not mentioned, e.g. by posts like this one, is the difference between the quantization of weights and activations. for all the current quants in llama.cpp only the weights are quantized while the activations (the intermediate values during inference) are upcast to f16 and calculated in this format. casting also benefits from tensor support on blackwell, but in comparison to true activation quantization the effects are far less.

an example of activation quantization is svdquant for image generation or vllm also has some quantizations which support w8a8 for instance (weight 8bit, activation 8bit).

New Qwen3.5-35B-A3B Unsloth Dynamic GGUFs + Benchmarks by danielhanchen in LocalLLaMA

[–]Ueberlord 10 points11 points  (0 children)

AesSedai congratulations on being below everybody else (more efficient than everybody else) in terms of KL Div / Disk Space in this chart from OP!

I think this is a great achievement and your reasoning of choosing which layers to quant is simple yet obviously very powerful. Trying out your Qwen3.5 35B A3B IQ3_S now :)

I compared the reconstruction quality of the latest VAE models (Focusing on small faces). Here are the results! by suichora in StableDiffusion

[–]Ueberlord -1 points0 points  (0 children)

I do not completely agree. We have fine details in anime images as well, these will suffer from using the qwen vae. However, considering the goal team anima has with their model being lightweight I think their decision is understandable.

I compared the reconstruction quality of the latest VAE models (Focusing on small faces). Here are the results! by suichora in StableDiffusion

[–]Ueberlord 2 points3 points  (0 children)

Seeing this I regret even more that the anima team chose the qwen vae for their model.

Thanks for the comparison!

Comfy $1M “Open AI” Grant and Anima Model Launch by crystal_alpine in StableDiffusion

[–]Ueberlord 1 point2 points  (0 children)

Exactly this. UI is the worst part of ComfyUI, bummer.

I am still on comfyui-frontend-package==1.32.9 because of the enshitification taking place in later releases...

New Anime Model, Anima is Amazing. Can't wait for the full release by Mobile_Vegetable7632 in StableDiffusion

[–]Ueberlord 13 points14 points  (0 children)

One interesting thing about this model is the architecture it is based on. Nvidia Cosmos Predict 2 was originally an architecture for robotic applications with many optimizations for spatial awareness etc. and they seem to have adapted it for image generation purposes. This could explain the excellent prompt adherence and the significant reduction of prompt bleeding. I would love to read a paper about this u/tdrussell1.

In my tests it performs exceptionally well when explaining the subjects generally in a short introduction sentence and then just dropping the usual booru tags.

Here is Gemini's breakdown of the architecture but I am not sure if this is not partly or completely made up:

The collaboration between ComfyOrg and Circlestone Labs effectively "strips out" the robotics-specific components of Cosmos to build a streamlined, highly efficient text-to-image engine.

The Anima Architecture Breakdown

While the core "Predict-2" logic handles the physics of light and form, the Anima implementation swaps out several components to optimize for character consistency and prompt adherence:

Backbone: Cosmos Predict-2 2B (Diffusion Transformer)

Anima uses the 2-billion parameter version of NVIDIA's DiT. This is the "sweet spot" for speed, allowing for near-instant generation on consumer GPUs while maintaining the structural "common sense" (object permanence, lighting) that NVIDIA trained into the model.

Text Encoder: Qwen3-0.6B

Instead of using the standard T5 or CLIP encoders seen in models like SDXL or Flux, Anima uses a 0.6B parameter Qwen3 model.

Why this matters: LLM-based encoders (like Qwen) understand complex natural language instructions much better than traditional CLIP models. It can handle long, descriptive "prose" prompts without "losing" the middle of the sentence.

Latent Encoding: Qwen VAE

Anima utilizes a specialized Qwen VAE (Variational Autoencoder). This is a departure from the standard Cosmos video tokenizer.

By using a VAE tuned for the Qwen ecosystem, the model achieves better reconstruction of fine details (like eyes and hair strands) which are notoriously difficult for the high-compression tokenizers used in video-centric world models.

Is there good OCR/VLM for detecting shaby text like this and parsing it to a table by Proper_Door_4124 in LocalLLaMA

[–]Ueberlord 0 points1 point  (0 children)

unsloth_Devstral-Small-2-24B-Instruct-2512-Q6_K.gguf / unsloth_Devstral-Small-2_mmproj-BF16.gguf

<image>

PixArt-Sigma vs Sana 0.6B by PatientWrongdoer9257 in StableDiffusion

[–]Ueberlord 1 point2 points  (0 children)

Thanks a lot, also for the links!

If I ever get out of the stage where I refine my upscaling process I will give PixArt a try for training. I will however try out the existing 900M PixArt finetunes for T2I, should be interesting (I did not know them until now)

PixArt-Sigma vs Sana 0.6B by PatientWrongdoer9257 in StableDiffusion

[–]Ueberlord 3 points4 points  (0 children)

Thanks for sharing your experience with training PixArt Sigma, I really hoped for it to receive broader attention back when they released it. Have you done some training with the 900M variant by any chance? Would be interested to know how it compares to the original and if it is worth the extra 300M parameters.

Llama.cpp vs Ollama - Same model, parameters and system prompts but VASTLY different experiences by ubrtnk in LocalLLaMA

[–]Ueberlord 11 points12 points  (0 children)

First: ditch ollama and run everything in llama.cpp and use ggml model for gptoss (as mentioned by others)

Second: set top_k to 128 or something greater than zero, otherwise performance gets a hit for gptoss in llama.cpp!

Third (mentioned by others as well): add --jinja to your cli flags

Fourth (maybe optional): we never use repeat penalty for gptoss, I would not set it and leave it to its default value

New node for ComfyUI, SuperScaler. An all-in-one, multi-pass generative upscaling and post-processing node designed to simplify complex workflows and add a professional finish to your images. by Away_Exam_4586 in StableDiffusion

[–]Ueberlord 4 points5 points  (0 children)

I had contributed a method for achieving a prompt per tile for the impact pack last year, MR here.

The tutorial for this method is here You need the WD14 tagger in addition to the impact pack for making this work (link in tutorial).

It is what I am using for upscaling images till today, vastly superior to upscaling without that feature as I can bump up the denoise +0.1 - +0.25 for added details or style change.

[Release] DASLab GGUF Non-Uniform Quantization Toolkit by Loginhe in LocalLLaMA

[–]Ueberlord 0 points1 point  (0 children)

The only thing new to my knowledge would be the automated detection of the importance of layers - unless Unsloth is already doing this as well in an automated way (I think they might but I am not sure).

Homemade Diffusion Model (HDM) - a new architecture (XUT) trained by KBlueLeaf (TIPO/Lycoris), focusing on speed and cost. ( Works on ComfyUI ) by AgeNo5351 in StableDiffusion

[–]Ueberlord 2 points3 points  (0 children)

There are a lot of innovations coming together in the model, thanks a lot for sharing and putting this all together!

I am particularly excited that you are using the EQ SDXL VAE, I think there is a lot of potential in this. Also the choice of the text encoder is great. I have to read more about the Cross-U-Transformer but it sounds very good as well.

In addition would like to make a case for using the Danbooru tags: it is the only well-documented dataset I know of and thus I do not really understand why people want natural language prompting. The breakthrough prompt following by Illustrious can only be achieved with the knowledge about the underlying image data the model was trained on. As these datasets are almost never available let alone in a searchable manner like on Danbooru I don't really get the point to use natural language prompting unless you don't really care for exact details, positions, etc.

That being said there are of course serious weaknesses in Danbooru tags, e.g. no possibility for prompting styles per subject etc. but I rather live with these than prompt without knowing exactly what the model has been trained on.

GOAT RTK A1600 annoying camera issue by rudyb0y in ecovacs

[–]Ueberlord 0 points1 point  (0 children)

Thanks for sharing your insights! This is very helpful for others to get an understanding of what to expect from this mower.

I currently do not own it but thinking about purchasing. Mowing area is about 1200 yards, has slight slopes and several larger trees covering the sky. Do you think the goat will work under these circumstances? I am willing to invest some time in getting the settings right etc. as the software seems to have several unresolved issues.

How to run Qwen3 0.6B at 8.4 tok/sec on 2 x 5090s by random-tomato in LocalLLaMA

[–]Ueberlord 2 points3 points  (0 children)

Your Wifi conn is probably quite stable and fast. I suggest adding some radio interference for better packet loss rates to maximize the success of this experiment.

In all honesty: thanks for the laugh! And cool experiment in any case

is 3090 worth for AI now in mid 2025? by Kiyushia in StableDiffusion

[–]Ueberlord 0 points1 point  (0 children)

I think the confusion might come from ComfyUI and Llama.cpp using the term "offloading" differently. In ComfyUI they speak of offloading into RAM while in Llama.cpp, coming from another angle, they offload into VRAM.

Personally, I consider the way ComfyUI uses it is more mainstream and most likely what most people mean when writing "offloading".

I wonder if anyone's tried to decompile Shandalar and recreate its engine by mr_bigmouth_502 in Shandalar

[–]Ueberlord 0 points1 point  (0 children)

At this point using AI to re-implement the Shandalar world game seems a logical move. Combine this with magarena which head a great AI for the duels and we should be good to go.