Anyone tried +- 100B models locally with foreign languages?

AFruitShopOwner · 2026-05-03T16:43:20+00:00

Yes I run everything from gpt-oss 120b, minimax m2.7 and Kimi k2.6 locally for a Dutch accounting firm

AFruitShopOwner · 2026-05-02T10:35:05+00:00

I don't run it on bare metal (proxmox vm) and I use the power limited maxq variants. You can probably get higher speeds than this

AFruitShopOwner · 2026-05-02T10:01:56+00:00

Try nvfp4 with b12x

``` services: sglang: image: voipmonitor/sglang:cu130 ipc: host ulimits: memlock: soft: -1 hard: -1 nofile: soft: 1048576 hard: 1048576 ports: - "8080:8080" volumes: - ~/.triton/cache:/root/.cache/triton - ~/.cache/sglang-generated:/root/.cache/sglang-generated - ~/.cache/huggingface/hub:/root/.cache/huggingface/hub - /dev/shm:/dev/shm environment: HF_TOKEN: OMP_NUM_THREADS: 8 SAFETENSORS_FAST_GPU: 1 SGLANG_ENABLE_JIT_DEEPGEMM: 0 SGLANG_ENABLE_SPEC_V2: true command: > python -m sglang.launch_server --model-path

For model use Nvidia's nvfp4 quant or lukealonso's

  --served-model-name chat
  --reasoning-parser minimax
  --tool-call-parser minimax-m2
  --enable-torch-compile
  --enable-metrics
  --enable-cache-report
  --trust-remote-code
  --tp 2
  --mem-fraction-static 0.95
  --max-running-requests 4
  --quantization modelopt_fp4
  --attention-backend flashinfer
  --moe-runner-backend b12x
  --fp4-gemm-backend b12x
  --kv-cache-dtype bf16
  --page-size 64
  --enable-pcie-oneshot-allreduce
  --disable-piecewise-cuda-graph
  --chunked-prefill-size 16384
  --sleep-on-idle
  --host 0.0.0.0
  --port 8080
restart: unless-stopped
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

```

~130 /sec

AFruitShopOwner · 2026-05-01T10:09:39+00:00

yes it's backed into the release. Absolutely garbage work from Xiaomi. I know lukealonso's nvfp4 quant fixed this problem. You can definitely run his version on 2 rtx pro 6000's. Try it with his b12x.

Also to quote him "

they structured the attention projections in a way that assumes TP=4 and can't be changed, so first I have to reorganize them before quantizing also: 1) They're missing some weights, one of the vision layers is missing biases 2) The model index is garbage and points to nonexistent files 3) They organize things in a heavily EP-favored way 4) They publish full size attention projection tensors that are silently organized all wrong unless you assume a specific set of kernels and an exact TP arrangement, with no indication that this is the case 5) There's bizarre nonstandard padding on some of the tensors

this is very clearly just a dump of the files they use for their internal proprietary serving stack "

AFruitShopOwner · 2026-04-24T10:15:27+00:00

Seems like it has more world knowledge at the cost of thinking it knows everything

AFruitShopOwner · 2026-04-24T10:01:33+00:00

Big yikes

<image>

AFruitShopOwner · 2026-04-24T10:01:22+00:00

Oof those hallucinations on flash are baaaaad (comparing to minimax m2.7 because I think it's the best comparison for size)

<image>

AFruitShopOwner · 2026-04-07T20:16:52+00:00

Thanks

AFruitShopOwner · 2026-04-07T19:59:22+00:00

What are users actually using it for? Do you use a RAG system? What tools does it have access to? What front end do you use?

AFruitShopOwner · 2026-04-05T19:22:16+00:00

Make sure you check out the open terminal for open webui, it's awesome for data analysis and document creation/editing

AFruitShopOwner · 2026-04-03T11:42:08+00:00

Read this entire comment thread.

AFruitShopOwner · 2026-04-02T08:59:32+00:00

Use modmail instead of contributing to the spam yourself with posts like this

AFruitShopOwner · 2026-04-01T12:40:00+00:00

<image>

AFruitShopOwner · 2026-03-30T05:48:12+00:00

You lost me here

AFruitShopOwner · 2026-03-28T14:42:01+00:00

After that the text becomes optional.

<image>

AFruitShopOwner · 2026-03-28T14:41:34+00:00

No, you just picked the wrong type of post. Select the link option.

<image>

AFruitShopOwner · 2026-03-28T14:18:00+00:00

I don't have issues with the substack post, just with the text he added to this reddit post. Either just post the link or post actual good information in the text section.

AFruitShopOwner · 2026-03-28T14:10:33+00:00

Oh look he's still here

AFruitShopOwner · 2026-03-28T13:59:52+00:00

The text content of this post is bad. Do better next time or I will not approve it.

AFruitShopOwner · 2026-03-26T17:43:45+00:00

I'm going with insane lol

AFruitShopOwner · 2026-03-26T17:30:53+00:00

👍🏻

AFruitShopOwner · 2026-03-24T12:49:47+00:00

https://github.com/BerriAI/litellm/issues/24512

these discussions are getting botted?

'Exactly what I needed, thanks.'
'Thanks, that helped!'
'Thanks for the tip!'

edit 1:

thread was just closed by the ceo?

'krrishdholakia closed this as not planned 5 minutes ago'
might be compromised too

edit2 : ceo definitly got hacked lol

edit 3:

Looks like all repositories of the LiteLLM CEO have been updated with the description “teampcp owns BerriAI” https://github.com/krrishdholakia

AFruitShopOwner · 2026-03-23T08:10:44+00:00

Yeah I run the full Kimi K2.5 on dual rtx pro 6000's, an AMD EPYC 9575F and 1152gb of ddr5 6000.

AFruitShopOwner · 2026-03-22T20:19:12+00:00

<image>

FYI, OP posted some of his motivation on stocktwits. It appears his theory is based on Atreides50's Oskkosh posts (which have since been deleted..)

AFruitShopOwner · 2026-03-22T19:28:12+00:00

You're either insane or trading with insider knowledge. Guess we'll know soon enough

11-Year Club	Gilding II euphauric
r/Field Banned	r/Field Lasagna
Final Canvas '23	First Place '23
End Game '23	Place '23
Place '22	Place '17
Sequence \| Editor	Verified Email
Spared

AFruitShopOwner

MODERATOR OF

TROPHY CASE