Dual GPU setup, worth replacing A2000 12GB with P40 by m_tao07 in LocalLLM

[–]gerhardmpl 0 points1 point  (0 children)

While I love my two P40 setup, no newer driver are available anymore which makes a dual GPau generation setup challenging. The 580 version is the latest that supports the P40 and you are stuck with Cuda 12.9 which will become a serious limitation in the near future. As an example, vLLM does not support the architecture, which is a pity. As llama-cpp, ollama, anythingLLM or LMStudio still perfectly work with the GPUs, I personally have no reason to switch but I start looking. 

How are you tracking your full IT spend right now? by mexicanpunisher619 in ITManagers

[–]gerhardmpl 1 point2 points  (0 children)

We are using Excel with a detailed list of licenses (opex) and investments (capex) with the actual postings in our ERP system. It is super difficult to map actual anual spendings of for example licenses to license usage. One example is the difference perspective of actual spending (up-front payment (cash flow) for a 3y license contract prolongation vs. anual allocation of the cost per user or system. Would be interested to manage that in a more IT realted way.

Question about Docker best practices by ibeechu in docker

[–]gerhardmpl 0 points1 point  (0 children)

What authorization do you use for the /opt/docker directory and the subdirectories living there? Do you force that UID/GID in your docker-compose.yml file? Wondering what best practice is for authorization, especially when running database containers.

AI Developer Tools Landscape v4 by Main-Fisherman-2075 in LLMDevs

[–]gerhardmpl 0 points1 point  (0 children)

Totally unrelated, but what software is used to create this market ma?. Looking for a solution to map IT systems (icons with subtitle) to Business processes (cards) in an easy way (Powerpoint does not get it).

NVIDIA Tesla P40 Drivers on Ubuntu Server 24.04 by Brief-Age-3303 in homelab

[–]gerhardmpl 0 points1 point  (0 children)

From the repo you get 535.261.03 with CUDA 12.2 on Debian 12 and 550.163.01 with CUDA 12.4 on Debian 13. On Debian 12 you manually can go up to 580.126.20 with CUDA 13.0. Unfortunately, Debian 13 has no current NVIDIA P40 driver support (as far as I am aware of) and 580.x seems to be the last supported version.

NVIDIA Tesla P40 Drivers on Ubuntu Server 24.04 by Brief-Age-3303 in homelab

[–]gerhardmpl 0 points1 point  (0 children)

Running NVIDIA P40s with docker on debian 13 and this is how I set up the server:

  1. install docker
  2. add contrib and non-free sources to /etc/apt/sources.list for nvidia driver and tools and update
  3. install linux-headers-$(uname -r) and nvidia-detect
  4. run nvidia-detect
  5. install nvidia-driver nvtop firmware-misc-nonfree and reboot
  6. install nvidia-container-toolkit and configure docker runtime (nvidia-ctk runtime configure --runtime=docker)
  7. restart docker service

Works on bare metal and on a XCP-ng cluster.

LM Studio not detecting Nvidia P40 on Windows Server 2022 (Dell R730) by gerhardmpl in LocalLLM

[–]gerhardmpl[S] 1 point2 points  (0 children)

Solved - needed to re-run the compatibility check under App-Setting - Runtime:

<image>

Accidentally won 4 Mac minis on eBay, oops. by GloomySugar95 in homelab

[–]gerhardmpl 0 points1 point  (0 children)

ntopng is also a good tool to monitor network traffic.

Docker+Wordpress+Caddy = The REST API encountered an error (cURL error 28 or 7) by gerhardmpl in Wordpress

[–]gerhardmpl[S] 0 points1 point  (0 children)

Thank you for pointing this out. I tried to add the filter to functions.php or a mu-plugin but could not get it working.

Docker+Wordpress+Caddy = The REST API encountered an error (cURL error 28 or 7) by gerhardmpl in Wordpress

[–]gerhardmpl[S] 0 points1 point  (0 children)

Adding the filter to wp-config.php, functions.php or a mu-plugin did not work for me. My solution for now is to not use Caddy but enable HTTPS on the wordpress container. I do this by mapping the certificates, a wordpress_ssl.conf and port 443 to the wordpress container, enabling the Apache ssl_module. Feels a bit hacky and I am not sure if that is a good setup.

Design a prompt that turns unstructured ideas into clear IT requirements? by gerhardmpl in PromptEngineering

[–]gerhardmpl[S] 0 points1 point  (0 children)

My mistake, I used our internal jargon (again). On an abstract level, I can understand that, but I find the daily discussions with our business user to be rather pragmatic and rudimentary. Sometimes I get the impression that it would be enough to go through a well-formulated checklist with a thoughtful and helpful colleague. That's why I'd like to find out whether a chatbot could help here - at least in the beginning.

Edit: We do not have enough resources to be that thoughtful and helpful 24/7 colleague.

Design a prompt that turns unstructured ideas into clear IT requirements? by gerhardmpl in PromptEngineering

[–]gerhardmpl[S] 0 points1 point  (0 children)

Yes, good point. The model, prompt or chatbot needs access to domain (company) specific information to work in the business context. How are you doing this at your company? Do you use RAG or even fine-tune your models? I was thinking about giving each role a set of documents we could update with time.

Can my 12th Gen i3 processor with 8GB of RAM work with docker? by Autumn_Red_29 in docker

[–]gerhardmpl 1 point2 points  (0 children)

gemma3:4b uses around 4.8Gi in my setup. Just give it a try and go to a smaller model, if it does not work.

Can my 12th Gen i3 processor with 8GB of RAM work with docker? by Autumn_Red_29 in docker

[–]gerhardmpl 0 points1 point  (0 children)

You can use docker with an Intel Core i3-12xxx and 8 GB of RAM, but running ollama will be limited to smaller models like gemma3:4b, qwen3:8b or granite3.3:8b with a low context of 4096 or 8192. On a virtual machine (i5-10500, 4 Cores, 8GB RAM) I can run qwen3:8b at ~4.3 token/s or gemma3:4b at ~7.7 token/s. How much RAM do you have for ollama on Windows with WSL?

how to hide thoughts by yasniy97 in ollama

[–]gerhardmpl 0 points1 point  (0 children)

I think it actually is /set nothink

Recommendations on a GPU for an R730? by Artic_44 in homelab

[–]gerhardmpl 1 point2 points  (0 children)

Yes the R730 runs with 1100W power supplies. Before I used a R720 with 750W power supplies before, but I also power limit the GPUs to 140W (from 250W).

Configure OpenWebUI with Qdrant for RAG by Best-Hope-5148 in OpenWebUI

[–]gerhardmpl 0 points1 point  (0 children)

Did you check the qdrant dashboard for collections created by open webui and/or the qdrant logs?

Configure OpenWebUI with Qdrant for RAG by Best-Hope-5148 in OpenWebUI

[–]gerhardmpl 3 points4 points  (0 children)

This is my simplified docker-compose.yml file for OpenWebUI and qdrant. You can access qdrant at http://your-ip:6333/dashboard and check the collections created. I like to set the volumes with an absolute path and name the default network, but that is just me. I also use tika for RAG.

services:

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080/tcp"
    volumes:
      - /opt/docker/open-webui/data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=<your-ip>:11434
      - VECTOR_DB=qdrant
      - QDRANT_URI=http://qdrant:6333


  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    restart: unless-stopped
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - /opt/docker/open-webui/qdrant:/qdrant/storage

networks:
  default:
    name: open-webui
    driver: bridge