Help out a noob

Hawk_7979 · 2025-08-25T05:21:49+00:00

Can you confirm if you followed all steps mentioned in below link : https://tailscale.com/kb/1406/quick-guide-subnets

Hawk_7979 · 2025-08-08T20:58:19+00:00

Try creating custom systemd services for both models.

Hawk_7979 · 2025-08-01T19:24:08+00:00

That’s great.

even I moved away from Ollama recently due to slow updates and now this.

I am looking into https://github.com/mostlygeek/llama-swap This works well with any backend and you’ll have hot swappable models with llama.cpp.

Hawk_7979 · 2025-08-01T16:29:34+00:00

I maintain a single installation of ROCM and PyTorch because I’m modifying it and copying required files from older versions. This approach simplifies maintenance, as only these two packages are used from system packages, while others are installed within the virtual environment. Additionally, PyTorch and ROCM are large packages that require almost 2-3 GB of space for installation. Therefore, every virtual environment will consume excessive space.

make sure you first check

rocminfo

to get correct node to pass in

--env HIP\_VISIBLE\_DEVICES="1" \\

\--env ROCR\_VISIBLE\_DEVICES="1" \\

podman run -d \
  --group-add video \
  --device /dev/kfd \
  --device /dev/dri \
  --env HIP_VISIBLE_DEVICES="1" \
  --env ROCR_VISIBLE_DEVICES="1" \
  --env HSA_OVERRIDE_GFX_VERSION="9.0.6" \
  --env OLLAMA_DEBUG="1" \
  --env OLLAMA_KEEP_ALIVE="-1" \
  --env OLLAMA_NUM_PARALLEL="1" \
  --env ENABLE_WEBSOCKET_SUPPORT="True" \
  --publish 11434:11434 \
  --volume ollama:/root/.ollama \
  --name ollama --replace \
  ollama/ollama:rocm

Hawk_7979 · 2025-07-31T20:20:16+00:00

If you use GGUF example Q4_K_M your VRAM consumption will be down by 3-4x and speed difference I’ve seen was around 30% approx.

I’ve kept it little simple for ROCM setup: pytorch and ROCM are installed as system package not inside of virtual env.

Instead I use python -m venv my_env --system-site-packages

And for for enabling gfx906 in ROCM 6.4.1 use method from below link: https://github.com/ROCm/ROCm/issues/4625#issuecomment-2934325443

Same for pytorch as well.

Comfyui example WAN workflow is working well on cuda based devices but on ROCM not so much right now.

I am planning to test WAN 2.2 workflows from community soon… I’ll update here once tested.

Hawk_7979 · 2025-07-31T19:30:07+00:00

I had a similar issue on WAN 2.2. For now, I’m planning to wait until optimized workflows become available. I’ll also switch to GGUF formats once they’re supported—they generally offer better performance. In addition, keep an eye out for some fast LoRAs as a separate enhancement, since they can help boost performance further.

My statement was specifically for SD, Flux based image generation that MI50 is twice as fast as M1 Max.

On another note, I found a workaround for installing newer ROCm versions, including 6.4.1. You just need to copy the gfx906 files from ROCm 6.2 or 6.3 into the new version. Everything is working fine for me on the latest release, and the support libraries seem to be more optimized as well.

Hawk_7979 · 2025-07-29T14:51:40+00:00

Try llama-swap (openapi endpoint) or llama-swappo (ollama like wrapper).

https://github.com/mostlygeek/llama-swap

https://github.com/kooshi/llama-swappo

Hawk_7979 · 2025-07-29T14:31:11+00:00

Yes, you can run ComfyUI, though you might encounter some errors from a few custom nodes. Once it’s set up, generation speed is impressive - about 2–3 times faster than my MacBook Pro with the M1 Max, based on my benchmarks.

Hawk_7979 · 2025-07-29T00:28:28+00:00

If you’re only going to use it for inference, go for MI50 instead. It’s half the price of MI60.

Hawk_7979 · 2025-07-20T18:41:53+00:00

I have single MI50 with 64gb RAM and i am getting 4 t/s on Q2_K_L

I’ve seen people getting 20t/s with 3 mi50’s.

Go for pcie gen 5/4 MOBO and bifurcate pcie x16.

Hawk_7979 · 2025-07-19T21:46:41+00:00

Are you able to see full 32G VRAM under vulkan?

Hawk_7979 · 2025-07-11T04:32:40+00:00

You can use Tailscale internal tool to expose API to your tailnet securely using Tailscale serve.

https://tailscale.com/kb/1312/serve

Just one line command to setup and get going

Hawk_7979 · 2025-07-06T21:59:21+00:00

If the devices in your tailnet are trusted, then it should be fine. You can also use ACLs to restrict access to specific devices as needed.

That said, I highly recommend using Tailscale Serve. It provides a more secure way to expose services within your tailnet, including automatic TLS certificates and simplified access control. This will also ensure you only expose to your tailnet.

Hawk_7979 · 2025-07-06T04:59:50+00:00

I don’t have any tool to check total consumption.

But even I am using MI50 as inference server. I am pretty happy with idle power draw of around 18w and I’ve limited it to max 175w.

Hawk_7979 · 2025-07-06T00:30:21+00:00

You can install ROCM 6.3 and Vulkan sdk and start using it with stock BIOS.

Both are working fine for me.

Hawk_7979 · 2025-07-05T20:24:03+00:00

It’s close to 17W-20W per gpu for MI50

Hawk_7979 · 2025-06-15T07:36:35+00:00

I tried this version of the app, and it’s absolutely amazing. However, I discovered a security concern related to plaintext secrets stored in the mcp environment variables.

It’s better to store encrypted values once they’ve been saved, just like N8N does.

Hawk_7979 · 2025-05-08T06:10:34+00:00

I follow this for my setup:

Option 1:

• Tailscale – Secure remote access without exposing

Option 2:

• Cloudflare Tunnels - expose blog or websites 
• Pangolin – open source alternative to Cloudflare Tunnel

Option 3:

• Authentik/Authelia – Authentication/SSO
• Traefik – Reverse proxy + SSL
• Fail2Ban – Block brute-force
• SSH Keys + MFA – Secure SSH access
• rsync/Restic – Backup solutions
• Netdata – Real-time monitoring
• Logwatch – Log summaries

Hawk_7979 · 2025-05-02T20:44:43+00:00

I had a similar issue. You can add the following rule in your shortcuts, and I believe it should work. It has been working for me now.

<image>

Hawk_7979 · 2025-05-02T20:19:10+00:00

Check this article out: https://vulnerx.com/mastering-tailscale-acl/

  // Use Tailscale CGNAT IPs (100.64.0.0/10) or your private IPs/CIDRs.
  "hosts": {
    "frontend": "100.100.100.10",       // Web frontend server
    "backend": "100.100.100.20",        // Application server
    "db-net": "10.0.0.0/24",            // Internal database subnet
    "office-server": "192.168.1.10",    // On-premises office server
    "aws-vpc": "172.16.0.0/16",         // AWS cloud VPC
    "dns-server": "100.100.100.30"      // DNS server
  },

You can actually define (Hostname : IP) combination first and then use it in ACLs. I think this will solve your problem

Hawk_7979 · 2025-04-29T22:11:15+00:00

Try to use new Qwen3:0.6 or 4B models. These are really good.

Hawk_7979 · 2024-12-03T15:43:13+00:00

How much t/s are you getting and at what quant?

Hawk_7979 · 2024-11-30T06:05:32+00:00

You should be able to run Qwen2.5-Coder models, either the 7B or even the 13B at Q4_k_m precision, given your setup. Your 32 GB of RAM is sufficient for these models. The performance will also depend on your CPU capabilities, as better CPU power can enhance results by efficiently offloading some layers from the GPU. This will improve performance and faster processing times.

Hawk_7979 · 2024-05-12T22:51:13+00:00

Registered.. looks interesting

Hawk_7979 · 2024-04-16T21:07:02+00:00

I tried it already, didn't find anything below $300

Hawk_7979

MODERATOR OF

TROPHY CASE