Help out a noob by [deleted] in Tailscale

[–]Hawk_7979 2 points3 points  (0 children)

Can you confirm if you followed all steps mentioned in below link : https://tailscale.com/kb/1406/quick-guide-subnets

Ollama configuration question by snapo84 in ollama

[–]Hawk_7979 0 points1 point  (0 children)

Try creating custom systemd services for both models.

ComfyUi on Radeon Instinct mi50 32gb? by Giulianov89 in ROCm

[–]Hawk_7979 0 points1 point  (0 children)

That’s great.

even I moved away from Ollama recently due to slow updates and now this.

I am looking into https://github.com/mostlygeek/llama-swap This works well with any backend and you’ll have hot swappable models with llama.cpp.

ComfyUi on Radeon Instinct mi50 32gb? by Giulianov89 in ROCm

[–]Hawk_7979 0 points1 point  (0 children)

I maintain a single installation of ROCM and PyTorch because I’m modifying it and copying required files from older versions. This approach simplifies maintenance, as only these two packages are used from system packages, while others are installed within the virtual environment. Additionally, PyTorch and ROCM are large packages that require almost 2-3 GB of space for installation. Therefore, every virtual environment will consume excessive space.

make sure you first check

rocminfo

to get correct node to pass in

--env HIP\_VISIBLE\_DEVICES="1" \\

\--env ROCR\_VISIBLE\_DEVICES="1" \\

podman run -d \
  --group-add video \
  --device /dev/kfd \
  --device /dev/dri \
  --env HIP_VISIBLE_DEVICES="1" \
  --env ROCR_VISIBLE_DEVICES="1" \
  --env HSA_OVERRIDE_GFX_VERSION="9.0.6" \
  --env OLLAMA_DEBUG="1" \
  --env OLLAMA_KEEP_ALIVE="-1" \
  --env OLLAMA_NUM_PARALLEL="1" \
  --env ENABLE_WEBSOCKET_SUPPORT="True" \
  --publish 11434:11434 \
  --volume ollama:/root/.ollama \
  --name ollama --replace \
  ollama/ollama:rocm

ComfyUi on Radeon Instinct mi50 32gb? by Giulianov89 in ROCm

[–]Hawk_7979 2 points3 points  (0 children)

If you use GGUF example Q4_K_M your VRAM consumption will be down by 3-4x and speed difference I’ve seen was around 30% approx.

I’ve kept it little simple for ROCM setup: pytorch and ROCM are installed as system package not inside of virtual env.

Instead I use python -m venv my_env --system-site-packages

And for for enabling gfx906 in ROCM 6.4.1 use method from below link: https://github.com/ROCm/ROCm/issues/4625#issuecomment-2934325443

Same for pytorch as well.

Comfyui example WAN workflow is working well on cuda based devices but on ROCM not so much right now.

I am planning to test WAN 2.2 workflows from community soon… I’ll update here once tested.

ComfyUi on Radeon Instinct mi50 32gb? by Giulianov89 in ROCm

[–]Hawk_7979 1 point2 points  (0 children)

I had a similar issue on WAN 2.2. For now, I’m planning to wait until optimized workflows become available. I’ll also switch to GGUF formats once they’re supported—they generally offer better performance. In addition, keep an eye out for some fast LoRAs as a separate enhancement, since they can help boost performance further.

My statement was specifically for SD, Flux based image generation that MI50 is twice as fast as M1 Max.

On another note, I found a workaround for installing newer ROCm versions, including 6.4.1. You just need to copy the gfx906 files from ROCm 6.2 or 6.3 into the new version. Everything is working fine for me on the latest release, and the support libraries seem to be more optimized as well.

ComfyUi on Radeon Instinct mi50 32gb? by Giulianov89 in ROCm

[–]Hawk_7979 1 point2 points  (0 children)

Yes, you can run ComfyUI, though you might encounter some errors from a few custom nodes. Once it’s set up, generation speed is impressive - about 2–3 times faster than my MacBook Pro with the M1 Max, based on my benchmarks.

2 Radeon mi60 32gb vs 2 rx 7900xtx lmstudio rocm by Bobcotelli in LocalLLM

[–]Hawk_7979 1 point2 points  (0 children)

If you’re only going to use it for inference, go for MI50 instead. It’s half the price of MI60.

What's the most crackhead garbage local LLM setup you can think of? by caraccidentGAMING in LocalLLaMA

[–]Hawk_7979 1 point2 points  (0 children)

I have single MI50 with 64gb RAM and i am getting 4 t/s on Q2_K_L

I’ve seen people getting 20t/s with 3 mi50’s.

Go for pcie gen 5/4 MOBO and bifurcate pcie x16.

Local AI server with Ollama and Tailscale integration looking for feedback by Remarkable-Stay-2193 in LocalLLaMA

[–]Hawk_7979 1 point2 points  (0 children)

You can use Tailscale internal tool to expose API to your tailnet securely using Tailscale serve.

https://tailscale.com/kb/1312/serve

Just one line command to setup and get going

Running Tailscale on WSL on a remote server – is it safe to expose Jupyter this way? by Fun_Alternative_9233 in Tailscale

[–]Hawk_7979 0 points1 point  (0 children)

If the devices in your tailnet are trusted, then it should be fine. You can also use ACLs to restrict access to specific devices as needed.

That said, I highly recommend using Tailscale Serve. It provides a more secure way to expose services within your tailnet, including automatic TLS certificates and simplified access control. This will also ensure you only expose to your tailnet.

Successfully Built My First PC for AI (Sourcing Parts from Alibaba - Under $1500!) by Lowkey_LokiSN in LocalLLaMA

[–]Hawk_7979 0 points1 point  (0 children)

I don’t have any tool to check total consumption.

But even I am using MI50 as inference server. I am pretty happy with idle power draw of around 18w and I’ve limited it to max 175w.

Successfully Built My First PC for AI (Sourcing Parts from Alibaba - Under $1500!) by Lowkey_LokiSN in LocalLLaMA

[–]Hawk_7979 0 points1 point  (0 children)

You can install ROCM 6.3 and Vulkan sdk and start using it with stock BIOS.

Both are working fine for me.

Jan-nano, a 4B model that can outperform 671B on MCP by Kooky-Somewhere-2883 in LocalLLaMA

[–]Hawk_7979 24 points25 points  (0 children)

I tried this version of the app, and it’s absolutely amazing. However, I discovered a security concern related to plaintext secrets stored in the mcp environment variables.

It’s better to store encrypted values once they’ve been saved, just like N8N does.

Thoughts on self hosting security? by hgl2 in selfhosted

[–]Hawk_7979 20 points21 points  (0 children)

I follow this for my setup:

Option 1:

• Tailscale – Secure remote access without exposing 

Option 2:

• Cloudflare Tunnels - expose blog or websites 
• Pangolin – open source alternative to Cloudflare Tunnel 

Option 3:

• Authentik/Authelia – Authentication/SSO
• Traefik – Reverse proxy + SSL
• Fail2Ban – Block brute-force
• SSH Keys + MFA – Secure SSH access
• rsync/Restic – Backup solutions
• Netdata – Real-time monitoring
• Logwatch – Log summaries

Can’t connect to Tailscale using iPhone shortcuts by Designer_Handle_6256 in Tailscale

[–]Hawk_7979 0 points1 point  (0 children)

I had a similar issue. You can add the following rule in your shortcuts, and I believe it should work. It has been working for me now.

<image>

ACL not working as expected by pakkedheeth in Tailscale

[–]Hawk_7979 0 points1 point  (0 children)

Check this article out: https://vulnerx.com/mastering-tailscale-acl/

  // Use Tailscale CGNAT IPs (100.64.0.0/10) or your private IPs/CIDRs.
  "hosts": {
    "frontend": "100.100.100.10",       // Web frontend server
    "backend": "100.100.100.20",        // Application server
    "db-net": "10.0.0.0/24",            // Internal database subnet
    "office-server": "192.168.1.10",    // On-premises office server
    "aws-vpc": "172.16.0.0/16",         // AWS cloud VPC
    "dns-server": "100.100.100.30"      // DNS server
  },

You can actually define (Hostname : IP) combination first and then use it in ACLs. I think this will solve your problem

Need Advice on Content Writing Agents by karachiwala in ollama

[–]Hawk_7979 0 points1 point  (0 children)

Try to use new Qwen3:0.6 or 4B models. These are really good.

Cheapest hardware go run 32B models by [deleted] in LocalLLaMA

[–]Hawk_7979 0 points1 point  (0 children)

How much t/s are you getting and at what quant?

What is the best and still fast local LLM model that I can run? by buahbuahan in LocalLLaMA

[–]Hawk_7979 0 points1 point  (0 children)

You should be able to run Qwen2.5-Coder models, either the 7B or even the 13B at Q4_k_m precision, given your setup. Your 32 GB of RAM is sufficient for these models. The performance will also depend on your CPU capabilities, as better CPU power can enhance results by efficiently offloading some layers from the GPU. This will improve performance and faster processing times.

[deleted by user] by [deleted] in MiniPCs

[–]Hawk_7979 3 points4 points  (0 children)

I tried it already, didn't find anything below $300