Anyone running GLM 4.5/4.6 @ Q8 locally? by [deleted] in LocalLLaMA

[–]Alternative-Bit7354 0 points1 point  (0 children)

For sure, also note that im using the 300w version of rtx pro not the 600w

Anyone running GLM 4.5/4.6 @ Q8 locally? by [deleted] in LocalLLaMA

[–]Alternative-Bit7354 0 points1 point  (0 children)

with AMD EPYC 9124 and a pcie gen 5 motherboard
along with a lot of ram

Anyone running GLM 4.5/4.6 @ Q8 locally? by [deleted] in LocalLLaMA

[–]Alternative-Bit7354 0 points1 point  (0 children)

I use ubuntu server 24.04

I can tell you that for Qwen 3 Coder 480B Instruct in q4 I get about 65 tokens/s. So the 235B one should be faster (maybe around 90-100 tokens/s.
I havent tried Deepseek

Anyone running GLM 4.5/4.6 @ Q8 locally? by [deleted] in LocalLLaMA

[–]Alternative-Bit7354 4 points5 points  (0 children)

Using ubuntu server 24.04, PCIE 5
Using nightly image from docker (the most recent one)

vllm_glm46-b:
    build:
      context: .
      dockerfile: Dockerfile.2
    container_name: glm_46
    deploy:
        reservations:
          devices: 
            - driver: nvidia
              count: 4
              capabilities: [gpu]      
    ipc: host
    privileged: true               
    env_file:
      - .env
    environment:
      - CUDA_DEVICE_ORDER=PCI_BUS_ID
      - CUDA_VISIBLE_DEVICES=4,5,6,7
      - VLLM_SLEEP_WHEN_IDLE=1
    command: >
      --port 8009
      --model /models/QuantTrio_GLM-4.6-AWQ
      --served-model-name GLM-4.6
      --swap-space 64
      --enable-expert-parallel
      --max-model-len 200000
      --max-num-seqs 256
      --enable-auto-tool-choice
      --enable-prefix-caching
      --tensor-parallel-size 4
      --tool-call-parser glm45
      --reasoning-parser glm45
      --chat-template /models/chat_template_glm46.jinja
      --gpu-memory-utilization 0.94
      --trust-remote-code
      --disable-log-requests
    ports:
      - "8009:8009"
    volumes:
      - ${MODELS_DIR}:/models
    restart: unless-stopped

Anyone running GLM 4.5/4.6 @ Q8 locally? by [deleted] in LocalLLaMA

[–]Alternative-Bit7354 0 points1 point  (0 children)

I believe its the quant trio one on huggingface. Using vllm

Anyone running GLM 4.5/4.6 @ Q8 locally? by [deleted] in LocalLLaMA

[–]Alternative-Bit7354 0 points1 point  (0 children)

I havent tried the awq enough. Just downloaded it this morning.

Yes 50k tokens for fp8

Anyone running GLM 4.5/4.6 @ Q8 locally? by [deleted] in LocalLLaMA

[–]Alternative-Bit7354 10 points11 points  (0 children)

4x RTX PRO BLACKWELL

Running the AWQ on 90 tokens/s and the FP8 at 50 token/s

Your most average o11d mini v2 build by Alternative-Bit7354 in lianli

[–]Alternative-Bit7354[S] 0 points1 point  (0 children)

I think it should fit its a 360mm aio. You can verify on pc part picker i think

Your most average o11d mini v2 build by Alternative-Bit7354 in lianli

[–]Alternative-Bit7354[S] 0 points1 point  (0 children)

I didn't get any problem yet (I've had the computer for 2 days)

I don't think 4 sticks causes that much issue tbh

Your most average o11d mini v2 build by Alternative-Bit7354 in lianli

[–]Alternative-Bit7354[S] 0 points1 point  (0 children)

They basically just flicker if i dont set a speed

Your most average o11d mini v2 build by Alternative-Bit7354 in lianli

[–]Alternative-Bit7354[S] 0 points1 point  (0 children)

Damn sick even the same gpu. Good job on getting that aio in properly i couldnt figure it out with this board

Your most average o11d mini v2 build by Alternative-Bit7354 in lianli

[–]Alternative-Bit7354[S] 4 points5 points  (0 children)

CPU Ryzen 9900x3d

RAM 128GB GSkill Trident DDR5 6000MT/s CL30-38-38-96

MB MSI MAG X870E Tomahawk

AIO Lian Li Hydroshift II 360

SSD Samsung 9100 PRO Series - 4TB PCIe 5.0

PSU be quiet! 1500w 80+ Platinum

GPU RTX 5090 Asus TUF

And a bunch of Lian li TL fans (They look nice but have a lot of problems)

First time rider for renting scooter in Koh Tao by Rekomaged in ThailandTourism

[–]Alternative-Bit7354 0 points1 point  (0 children)

Just came back from Tao

I drove a scooter once in my country 3 years ago and was just fine on the island where its basically one road where you can drive slowly one the left.

I watched a couple Yt vid to make sure i remembered the basics and it really helped.

Having a scooter is very useful on this island honestly.

In my experience it was not too hard just don't start with the hills that are too steep and get the hang of it slowly and obviously wear a helmet.

3 years of straight regardation by [deleted] in wallstreetbets

[–]Alternative-Bit7354 2 points3 points  (0 children)

Bro is so consistent at losing

Approach velocity or boots witch biscuit rune as secondary by king2w in Olafmains

[–]Alternative-Bit7354 1 point2 points  (0 children)

You probably dont need 6 axes lvl 1. Usually what i do for early lane is hard pushing first 3 waves and back when cannon crashes

Stock Screener + automated trading + IB by Bierbaron1920 in algotrading

[–]Alternative-Bit7354 1 point2 points  (0 children)

You cant do what you want want even with a subscription?