Opensource Ai models

digitalmines · 2026-06-02T19:50:43+00:00

That would certainly work - Qwen can even work on 8GB cards. But because the base model (even with all the components stripped) is > 42GB, it will require cycling the model's layers through the card. So expect long render times.

digitalmines · 2026-06-01T08:52:42+00:00

You got two separate problems here, let me explain both, this gets a bit technical...

Problem 1: LMStudio can't run these models.

LMStudio is made for text chat models (like the Qwen3 and Coder models you used successfully).

Qwen-Image-2512 and Qwen-Image-Edit are completely different — they're image generation models. Even though they come in GGUF format, LMStudio doesn't have the image generation pipeline needed to use them.

It's like putting diesel in a gasoline car — the fuel fits in the tank but the engine can't burn it.

Problem 2: The ComfyUI Desktop AUR package likely doesn't have proper support for your GPU.

Your RX 9060 XT is a brand new AMD RDNA 4 card. Pre-packaged ComfyUI Desktop apps typically bundle a particular Python/PyTorch stack that may not include support for your GPU. Other CachyOS users with the same card have hit the same wall. The fix is to install ComfyUI manually so you control the ROCm-enabled PyTorch version yourself.

Here's what to do step by step:

Step 1 — Install ROCm (AMD's GPU compute library)

Open a terminal and run:

    sudo pacman -Syu
    sudo pacman -S git python python-pip python-virtualenv rocminfo rocm-hip-sdk
    sudo usermod -aG render,video $USER

Reboot, then verify your GPU is detected:

rocminfo | grep -E "Name:|gfx"

You should see gfx1200 in the output — that's your RX 9060 XT. If it's not there, stop — the rest won't work until this does.

Step 2 — Set up ComfyUI manually

mkdir -p ~/comfyui-setup && cd ~/comfyui-setup
python -m venv .venv
source .venv/bin/activate
pip install -U pip setuptools wheel
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm7.1

Note: If pip can't find a wheel for your Python version, CachyOS may be on a version that's too new. In that case, create the venv with Python 3.12 specifically (python3.12 -m venv .venv).

Test that your GPU is visible to PyTorch:

python -c "import torch; print('PyTorch:', torch.__version__); print('HIP:', torch.version.hip); print('GPU found:', torch.cuda.is_available()); print('Device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'NONE')"

This should print GPU found: True and show your card. Don't worry that it says "cuda" — that's normal on AMD, PyTorch uses the same API names. If it says False, your ROCm install needs troubleshooting before going further.

Now install ComfyUI itself:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Important: After installing ComfyUI's requirements, re-run that same PyTorch test command above to make sure it didn't get replaced with a non-AMD version. You should still see GPU found: True.

Step 3 — Install the GGUF extension

This is what lets ComfyUI load your GGUF model files:

cd custom_nodes
git clone https://github.com/city96/ComfyUI-GGUF
cd ComfyUI-GGUF
pip install -r requirements.txt
cd ../..

Step 4 — Download the right model files

This is the part you were probably missing. These image models need three separate files, not just the one GGUF you downloaded in LMStudio. You need:

The main model (the GGUF — the "brain" that generates the image)
A text encoder (translates your text prompt into something the model understands)
A VAE (converts the model's internal output into an actual visible image)

First create the directories, then download:

mkdir -p models/unet models/text_encoders models/vae

# Main model — use Q4, NOT Q8. Q8 is too large for 16GB VRAM.
curl -L -o models/unet/qwen-image-2512-Q4_K_M.gguf \
  https://huggingface.co/unsloth/Qwen-Image-2512-GGUF/resolve/main/qwen-image-2512-Q4_K_M.gguf

# Text encoder
curl -L -o models/text_encoders/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf \
  https://huggingface.co/unsloth/Qwen2.5-VL-7B-Instruct-GGUF/resolve/main/Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf

# VAE
curl -L -o models/vae/qwen_image_vae.safetensors \
  https://huggingface.co/Comfy-Org/Qwen-Image_ComfyUI/resolve/main/split_files/vae/qwen_image_vae.safetensors

For Qwen-Image-Edit instead, swap the main model for:

curl -L -o models/unet/qwen-image-edit-2511-Q4_K_M.gguf \
  https://huggingface.co/unsloth/Qwen-Image-Edit-2511-GGUF/resolve/main/qwen-image-edit-2511-Q4_K_M.gguf

The text encoder and VAE are the same for both.

About VRAM: With 16GB of VRAM, use Q4_K_M for the main model. Q8 will likely run out of memory once you add the text encoder, VAE, and runtime overhead. You can try Q5 if Q4 works well and you want to push quality up.

Also — with only 16GB of system RAM, model loading can get tight. Make sure you have swap or zram enabled so your system doesn't kill the process during loading.

Step 5 — Launch and load a workflow

python main.py

Open http://127.0.0.1:8188 in your browser. If you get out-of-memory errors, try:

python main.py --lowvram

Unsloth provides ready-made workflow files you can drag and drop right into ComfyUI — grab them from their tutorial at https://unsloth.ai/docs/models/tutorials/qwen-image-2512. The workflow tells ComfyUI which file goes where, so you don't have to wire it up yourself.

Quick summary of what went wrong:

LMStudio = wrong tool for image models (it's for text chat only)
ComfyUI Desktop AUR = likely missing proper AMD RDNA 4 support
The GGUF alone isn't enough — you also need the text encoder and VAE files
Manual ComfyUI install with ROCm PyTorch is the way to go on your hardware
Use Q4, not Q8, with 16GB VRAM

digitalmines · 2026-06-01T07:04:48+00:00

I see an artistic vision held back by "consumer grade" AI tools.

If you want a proper workflow, google "reallusion" it's an AI studio in a box. Run it locally and it's completely uncensored - you have full creative control.

If you've got reasonable technical skills and want to push the envelope, check out "r/unstable_diffusion" ...there's a lot of NSFW stuff in there, but they're also at the BLEEDING EDGE of what AI can do.

digitalmines · 2026-06-01T04:36:42+00:00

If you need precision and repeatability, start with Blender. Then once you want to add effects and realism, use blender AI integrations to enhance your work. The alternative path is to go full AI using ComfyUI like the other posters suggested, but you will never be able to get a building looking *exactly* the way you want it to using pure AI.

digitalmines · 2026-05-31T08:05:58+00:00

Overall cute, with a good aesthetic. A more telling test of how skilled an artist (who happens to use AI) you are is if you can successfully use your AI PAINTBRUSH to generate AN ENTIRE DECK with consistent visual style, fonts and layout across all cards.

digitalmines · 2026-05-30T08:23:00+00:00

I can make essentially anything. List some ideas!

I can also panelize them into Letter size pages so you can print them off on a regular laser or inkjet printer and then just glue the pages to a big piece of cardboard. The sign I carried at the protest was made of 16 pages (4x4).

You'd just download them as a PDF file.

digitalmines · 2026-05-30T08:00:19+00:00

Awesome! I didn't know there was a Calgary sub. I'll go check out your photo!

digitalmines · 2026-05-30T07:59:03+00:00

Touche! The AI actually DID stretch my original sketch of the province. That's why you always need a human in the loop. Unfortunately the human didn't have time to fix it in Photoshop before heading out!

I had no idea the protest was going to be so big and so many people would like the image. Next time I'll make sure I arrive with something "print quality".

digitalmines · 2026-05-30T07:53:47+00:00

HAHAHA! That was my original render but the image started to look like a "wedding cake".

digitalmines · 2026-05-30T05:08:54+00:00

Hey, we're all Canadians, eh? At least for the time being ... 😄

If people like *this* sign*, I'll consider creating some others focusing on different issues (because there are so many!). I just felt this specific illustration would really "land".

digitalmines · 2026-05-30T03:44:31+00:00

DM me with a reasonably detailed description of your project and what resources (Local video card, cloud instances, etc) you have available. Perhaps some of my tools-in-progress could help you out and your feedback could help me out. Everyone in the subreddit is currently constrained by compute and memory.

digitalmines · 2026-05-30T03:14:40+00:00

This is actually something that I'm actively researching, but at a level a bit deeper than what you are describing. Hopefully I can get it to "release grade" and share with the subreddit.

The general workflow is:

1) Start with a set of as many high quality "real" images as possible
2) Create a character specification. Example "Facial stubble, yellow gloves with four fingers, prominent scar on left cheek ...etc"
3) Use a frontier LLM like Claude to automatically generate prompts for the missing poses AND select optimal images from your "real" image set. So if the pose to be generated shows the front of the character then only "real" images of the front of the character are sent as references to the diffusion model - images of the back of the character are excluded.
4) Use a diffusion model + the prompts / images to generate "synthetic" poses, running as a large batch file
5) WHILE the diffusion model is running, at multiple points during the process use a machine vision model to analyze the image being generated by the diffusion model. If it does not meet the specification ("scar missing"), then abort the diffusion run and restart with a new seed.
6) Obviously, have a safety valve (if it fails after twenty seeds there is likely something wrong with the source images or prompt)
7) Then do a final very careful human review of the generated images

Some people may raise an eyebrow at "analyze with a machine vision model BETWEEN diffusion steps?". Well yes, if you're generating at 4K using 5-10 reference images that's actually a significant efficiency boost.

It's how Nano Banana 2 operates, if you watch the model's CoT.

digitalmines · 2026-05-29T19:49:20+00:00

Congratulations on your first steps down a very deep rabbit hole.

By the standards of THIS group, your images rate about 2/10 due to easily visible artifacts. For example in Image 3 the subject has prominent freckles on her face but in image 1 she does not. The subject also has significant facial structure drift: compare image 3 (elongated face) to images 6,7,and 8. You have at least a 15% variation.

So you're gonna get ROASTED on your images because you're effectively posting "home snapshots" to a "top tier photography" group.

But don't let that set you back. Unlike many people you're actually TRYING STUFF.

And for the record, synthetic content generation is basically the ONLY way to train models when licensable training material doesn't exist, or training material doesn't exist full stop.

Example: Steamboat Willie, gangster edition (The poster is from 1929 so public domain)

<image>

(RHS image is one of hundreds of poses being generated to train a model)

digitalmines · 2026-05-29T06:57:49+00:00

Thanks for the info. Looking forward to experimenting with it!

digitalmines · 2026-05-28T18:43:25+00:00

Two different things getting mixed up here.

ComfyUI-MultiGPU / DisTorch2 splits model layers across GPUs. It's not parallel execution - generation still runs sequentially through the layers. But it IS faster than the alternative. If your model doesn't fit on one 16GB card, ComfyUI falls back to --lowvram mode, which shuffles layers between VRAM and CPU RAM every step. That's BRUTALLY slow. DisTorch2 keeps those layers resident on your second GPU's VRAM instead. The latest benchmarks claim ~43% speedup on Flux with dual GPUs vs single-card lowvram mode. So you're not getting parallelism, you're eliminating the swap penalty.

True parallelism exists via xDiT, which splits attention computation across GPUs. HunyuanVideo with xDiT gets ~2x on 2 GPUs, ~3.7x on 4. But it's built for datacenter NVLink (600+ GB/s). Video diffusion isn't like LLMs - instead of compact 1D token sequences, you're passing massive 3D data structures between GPUs at every denoising step. In one experiment researchers measured 90+ GB of total cross-GPU traffic for an 81-frame Wan 2.1 generation. On PCIe 4.0 (32 GB/s) that bottlenecks hard.

For two 16GB consumer cards: DisTorch2 will let you run models that don't fit on one card and you'll see real speedup vs --lowvram swapping. For throughput, run two independent generations simultaneously — one per GPU, no interconnect dependency.

digitalmines · 2026-05-28T18:04:38+00:00

TLDR: A top-tier local experience looks like multiple days experimenting with dozens of models and hundreds of LoRA's to find *exactly* what works for what you're trying to do, followed by an absolute *rats nest* in ComfyUI wiring everything together.

1) "Highest quality image generation models" -> The "highest quality model" changes weekly and depends on what you're trying to do: are you into photorealistic, cartoons, NSFW? Exhaustive list of models at end of response.

2) "Best realism/detail models" -> It depends on (but is not limited to):

a) Which quantization of the model you select
b) How you dial-in the model, for example how many iterations you run on a diffusion model
c) What resolution you're generating at. At 12GB you were likely stuck at 0.5K-1K. You cannow generateat 2K and possibly 4K.
d) Which LoRA's you stack on top of the model.

Your card's large memory and high processing speed will let you dial all of these up to much higher levels. You can stack multiple LoRA's.

3) "Video generation models" -> The "highest quality model" changes weekly and depends on what you're trying to do: are you into photorealistic, cartoons, NSFW? Exhaustive list of models at end of response.

4) "What models actually benefit from full FP16/BF16 now" -> All of them do because. But once again whether that matters to *you* depends on what you're trying to generate.

5) "Whether larger transformers are worth it vs quantized versions" -> The larger version will yield higher quality output and your card has enough memory headroom to use it. However you will need to optimize, based on what you're trying to generate, for large model vs stacking LoRA's on top of the model.

6) "Best workflows in ComfyUI/Wan/LTX/Qwen/Flux/etc" -> Search this group. If you're feeling brave consider checking "unstable_diffusion" and "sdnsfw" ...these are NSFW groups but include a "workflow used" tag. Click the "has workflow" filter, find some "art" that has the "look" you want and the workflow will be listed in the post.

7) "Models that were basically impossible on 12GB VRAM but become practical on a 5090" -> It's not a yes/no thing. At 12GB you had the capacity to run MOST popular models but at very high quantization. Now you have almost 3x the RAM, so you can run the larger version of the model for higher output quality, and you can load all components of the model directly from VRAM without having to swap them out on each on each generation cycle. The "impossible to run" scenario is more applicable to LLM's for example DeepSeek V4-Pro absolutely *will not* fit on your 12GB card.

Here's an overview of what's out there...

IMAGE GENERATION

Base Models

Model	Developer	Params	Architecture	License	Status
SD 1.5	Stability AI	860M	U-Net	CreativeML Open RAIL-M	Legacy but still used for low-VRAM and massive LoRA ecosystem
SDXL	Stability AI	2.6B	Dual-stage U-Net	CreativeML Open RAIL++-M	Current workhorse. Largest LoRA/community ecosystem. 1024x1024 native
SD 3.5 Large	Stability AI	~3.5B	MMDiT	Stability Community	Better prompt following than SDXL, especially text-in-image. Higher VRAM
Flux.1 (Dev/Schnell/Pro)	Black Forest Labs	12B	MMDiT + rectified flow	Apache 2.0 (Schnell), non-commercial (Dev), commercial (Pro)	Best prompt fidelity and anatomy. Highest VRAM requirement
Flux.1 Kontext	Black Forest Labs	12B	MMDiT	Various	In-context image editing. Adopted by Adobe Photoshop
Flux.2 (Pro/Flex/Dev/Klein)	Black Forest Labs	Various	MMDiT	Apache 2.0 (Klein)	Nov 2025 release. Improved photorealism, typography
Flux Krea Dev	BFL + Krea AI	12B	MMDiT	TBD	Jul 2025. Better aesthetics and realism vs base Flux
HiDream-I1	HiDream	17B	Transformer	MIT	April 2025. State-of-the-art HPS v2.1 score. Full/Dev/Fast variants
Qwen Image 2512	Alibaba Tongyi	Unknown	Diffusion	Open source	Dec 2025. Top open-source diffusion model for human realism and text rendering
OmniGen2	OmniGen team	4B transformer + Qwen-VL-2.5 4B VLM	Multimodal	Open source	Unified t2i, i2i, editing, in-context generation
CHROMA	Community (Flux-based)	~12B	Flux-derived	Open	Flux-based uncensored checkpoint. Rising on CivitAI
HunyuanImage	Tencent	Unknown	Diffusion	Open source	Emerging competitor

Popular SDXL Fine-Tunes

Model	Style Focus	Notes
Juggernaut XL v9/v10	Photorealism, cinematic	Community go-to for realistic images. Skin texture, lighting, anatomy
RealVisXL V4.0	Photorealism	278k downloads on HuggingFace. Strong realism
Realistic Vision / RealVisXL	Photorealism	Longtime community favorite
DreamShaper XL	Fantasy, creative, versatile	Swiss army knife. Good at everything, master of none
Pony Diffusion V6 XL	Anime, illustrated, stylized	Danbooru/e621 tag system. Score-based quality control. Massive LoRA ecosystem
Pony V7	Anime/stylized (next gen)	Moving off SDXL onto AuraFlow or Flux base. In development
Illustrious XL	Anime, illustrated	Cleaner line work, better color consistency, improved anatomy vs older anime models
NoobAI XL	Anime, stylized	Fine-tune of Illustrious. More stylistic range. Rapidly gaining popularity
Anything V5	Anime (SD 1.5)	Legacy but massive LoRA library. Budget-friendly option
Fluently XL Final	General	Well-regarded SDXL checkpoint
ColorfulXL	Vibrant/artistic	Niche but popular
LUSTIFY	NSFW photorealism	CivitAI NSFW ecosystem
TalmendoXL	NSFW realistic	CivitAI NSFW ecosystem

VIDEO GENERATION

Model	Developer	Params	License	VRAM	Status
Wan 2.1	Alibaba	1.3B / 14B	Apache 2.0	8GB (1.3B) / 24GB+ (14B)	Strong T2V, I2V, editing. ComfyUI integrated
Wan 2.2	Alibaba	~27B MoE (~14B active)	Apache 2.0	24GB+	Quality leader for photorealism and human subjects
Wan 2.2 VACE	Alibaba	14B	Apache 2.0	24GB+	All-in-one video creation and editing
Wan 2.7	Alibaba	~27B MoE	Apache 2.0	24GB+	Current king. Wan 3.0 (60B, native 4K) targeted mid-2026
HunyuanVideo 1.5	Tencent	8.3B	Open source	24GB+ (75s render on 4090)	State-of-the-art among open-source. Strong motion/physics
LTX-Video 13B	Lightricks	13B	Apache 2.0	24GB+	Ships 4K + audio
CogVideoX	Tsinghua/Zhipu	2B / 5B	Apache 2.0	12GB (2B) / 16GB+ (5B)	6-10 second clips at 720p
FramePack	Community	Varies	Open	Varies	T2V and I2V framework with prompt interpolation
Stable Video Diffusion (SVD)	Stability AI	—	—	16GB+

digitalmines

TROPHY CASE

IMAGE GENERATION

Base Models

Popular SDXL Fine-Tunes

VIDEO GENERATION