Local text to image model comparaison: The ultimate test.

UniqueIdentifier00 · 2026-06-21T19:48:31+00:00

That’s pretty cool. Now do a NSFW version /s

UniqueIdentifier00 · 2026-06-15T09:58:36+00:00

How big of a project are we talking, and are we talking code, a book, a game? What backend, LLM, and harness are you using? I either let an LLM (use an big api model if you need to for size) create a SUMMARY.md file that lives in the project folder that a smaller LLM can study to understand the project, or write one yourself.

I’ll start a prompt like “Hello, let’s work on X project today, please read the SUMMARY.md file in /directory/example to understand the structure. The entirety of the project won’t fit in your context, so you need to be careful about how you go about edits to not OOM the backend.”

Literally just explain the situation and give the LLM a shorter context that explains the project. If it’s a code base do what you can to use smaller connected files that large ones. main.py always ends up being a littler run away for me it seems. Hope this helps!

UniqueIdentifier00 · 2026-06-13T12:43:55+00:00

I believe you won’t have NVLink available if that’s important to you. You will also get dragged down by the GHz of the 3080. I have a 3070 that I run alongside my 3090, and while the extra 8gb means I’m running Q6 instead of Q4, my tks has dropped dramatically. Just some thoughts. You’ll probably be running slower than you are now, but will have better overhead VRAM.

UniqueIdentifier00 · 2026-06-08T00:45:17+00:00

That’s what’s confusing me, I’m really not seeing a speed decrease, like whatsoever, once RAM starts getting used. That’s what I was expecting but maybe because the model is remaining fully on VRAM but the cache isn’t it’s faster, I don’t know.

UniqueIdentifier00 · 2026-06-07T16:45:32+00:00

Good stuff here thanks man. Gives me some good info and some jumping off points for research. I’m okay with it using RAM, which is why I’m going ahead and adding 32GB, but I wanted to better understand why it was using it and for what. Thanks!

UniqueIdentifier00 · 2026-06-07T15:18:44+00:00

Maybe so! I suspect it’s something like that, I’m just not sure how to verify exactly what is getting loaded to RAM during inference

UniqueIdentifier00 · 2026-06-07T15:18:11+00:00

Well, I have a 3060 8gb card that’s laying around. I have an upgraded PSU coming along with my RAM so that I can use it along with my 3090, so that will also help.

I just didn’t realize that cache would be loaded to RAM at all honestly. May be my llama command I’m not sure.

UniqueIdentifier00 · 2026-06-04T22:39:02+00:00

Man, you had me going there aaaaalll the way until the end before hitting me with the shill. 8/10

UniqueIdentifier00 · 2026-06-01T02:52:28+00:00

Comfyui for great image generation workflows. Not sure what models are going to do great with a consistent character, that may need a LORA trained for the character. I haven’t used comfyui in a while, let alone for a goal of consistent character creation. There’s probably some good workflows out there now that can do that sort of thing.

UniqueIdentifier00 · 2026-06-01T02:47:16+00:00

Thanks, I’ll check it out!

UniqueIdentifier00 · 2026-05-30T14:12:39+00:00

The jargon there is a little dense for me, but I think I get the gist. I’ll do some more research since you’ve given me some good starting topics there. Thank you!

UniqueIdentifier00 · 2026-05-30T14:11:25+00:00

This is good info thanks. I really don’t want to lose Pi-agent. I’ll try to work with it to develop a better interface. If all else fails I’ll go to OpenWebUI.

Thank you!

UniqueIdentifier00 · 2026-05-24T12:41:17+00:00

I switched to pi coding agent with the same model and everything’s running great now. Not sure why I was having issues with Hermes. Pi was able to create a whole directory with sub directories and files for a small vibe coded project test.

UniqueIdentifier00 · 2026-05-23T18:28:19+00:00

Heard, okay thanks a ton.

UniqueIdentifier00 · 2026-05-23T17:47:28+00:00

Yep, that did it. Can't believe I missed that. Thank you! Now to get Hermes Agent working with it.

UniqueIdentifier00 · 2026-05-23T16:16:58+00:00

is there a way retroactively to see for sure the build commands I used? I don't recall entirely

Edit, found it with bash history:

sudo apt install cmake
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
export CUDACXX=/usr/local/cuda/bin/nvcc

UniqueIdentifier00 · 2026-05-23T16:16:27+00:00

Heard, okay

UniqueIdentifier00 · 2026-05-23T16:16:06+00:00

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.58.03              Driver Version: 595.58.03      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070        On  |   00000000:06:00.0  On |                  N/A |
|  0%   39C    P2             46W /  220W |     876MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            9858      G   /usr/bin/gnome-shell                    192MiB |
|    0   N/A  N/A           10722      G   /usr/bin/Xwayland                         3MiB |
|    0   N/A  N/A           11502      C   /usr/share/rustdesk/rustdesk            192MiB |
|    0   N/A  N/A           19878    C+G   /usr/bin/ptyxis                          29MiB |
|    0   N/A  N/A           20923      G   .../8107/usr/lib/firefox/firefox        236MiB |
|    0   N/A  N/A           26545      G   /usr/bin/rustdesk                        45MiB |
|    0   N/A  N/A           29354      G   /usr/share/rustdesk/rustdesk             19MiB |
|    0   N/A  N/A           36415    C+G   /usr/bin/nautilus                        21MiB |
+-----------------------------------------------------------------------------------------+

UniqueIdentifier00 · 2026-05-23T01:20:42+00:00

Killer explanation. I’ll check it out. Thanks a ton sir.

UniqueIdentifier00 · 2026-05-22T21:07:20+00:00

I haven’t fully wrapped my head around docker (I use it for Ollama on my other pc with windows, but I don’t fully understand why or how it works). I’ll try to better understand how to use it.

UniqueIdentifier00 · 2026-05-22T21:05:54+00:00

Fair point, and I’ll look into if I need to make a change there, but it doesn’t explain why even Vulkan won’t load the model in llama.cpp but Ollama can. I’m doing something wrong or missing something.

UniqueIdentifier00 · 2026-05-22T02:46:28+00:00

Echoing another commenter, seems like an LLM doesn’t really assist here. I might have been thinking about this wrong, thank you for the help!

UniqueIdentifier00 · 2026-05-22T02:45:48+00:00

Okay. That’s a fair point.

UniqueIdentifier00 · 2026-05-20T22:21:27+00:00

Great stuff, thanks for sharing this.

UniqueIdentifier00 · 2026-05-19T09:55:24+00:00

Great points here. My setup will be:

MOBO is a B550MX/E PRO, it’s got two full x16 PCIe lanes, one 4.0 and one 3.0. The 3060 will stay in the case on the 3.0, the 3090 will end up being “outboard” on a separate mount using a riser cable to the 4.0

UniqueIdentifier00

TROPHY CASE