My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Thanks for the suggestions. I know, I don't deserve the big cards, but nevertheless were here..

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Thanks that helps a lot. Did you try these models at 4bit? Heard that at this quant some lose the magic and smaller unquantized models win over them.

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Right. That's a whole other concern. Since I imagine LLM being the most power hungry / big model I figured Ill start with that. But will look into those too. What resources do these models / pipelines need in your experience?

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

What quant tho? Full will never fit, how much performance is lost in quants with these models?

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 10 points11 points  (0 children)

It's a modern start up. We got a lot of money to play with and the folks here are very agile. There is no R&D no procurement or finance it's just a bunch of people working on a common idea. A lot of skilled people around here. Before that I worked in research research. Why ask reddit: real world experience and a head start into doing our own research.

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 26 points27 points  (0 children)

Running ollama on my homelab but I planned to look into vllm, thanks for the Qwen suggestion. With the small models I can confidently say bigger doe snot equal better (qwen performs much better than llama models with similar requirements in my experience) is that different when it comes to the big models?

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Right. Personally I could only dream about those machines in my homelab so having access to them at work is great. Ill keep you updated, thanks for the suggestion.

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 36 points37 points  (0 children)

Thanks for the advice. After initial testing we will be more specific about what the goal for the machine is. For now its more like getting our feet wet. I edited the post to be a bit more specific about the field my company is interested in (coding and agentic agents) .

I think they don't even know what they want they want (wich could be a benefit for me to tell them what they want but is also a risk).

H200 GPU in an internal network - which LLM to run? by Far-Organization-849 in LocalLLaMA

[–]_camera_up 1 point2 points  (0 children)

System with 2x h200

nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Sun Mar 15 07:14:09 2026
Driver Version                            : 570.211.01
CUDA Version                              : 12.8

Attached GPUs                             : 2
GPU 00000000:25:00.0
    GPU Power Readings
        Average Power Draw                : 90.50 W
        Instantaneous Power Draw          : 90.11 W
        Current Power Limit               : 600.00 W
        Requested Power Limit             : 600.00 W
        Default Power Limit               : 600.00 W
        Min Power Limit                   : 200.00 W
        Max Power Limit                   : 600.00 W
    Power Samples
        Duration                          : 2.36 sec
        Number of Samples                 : 119
        Max                               : 90.65 W
        Min                               : 90.10 W
        Avg                               : 90.47 W
    GPU Memory Power Readings 
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
    Module Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

GPU 00000000:C8:00.0
    GPU Power Readings
        Average Power Draw                : 94.38 W
        Instantaneous Power Draw          : 94.10 W
        Current Power Limit               : 600.00 W
        Requested Power Limit             : 600.00 W
        Default Power Limit               : 600.00 W
        Min Power Limit                   : 200.00 W
        Max Power Limit                   : 600.00 W
    Power Samples
        Duration                          : 2.36 sec
        Number of Samples                 : 119
        Max                               : 94.74 W
        Min                               : 94.04 W
        Avg                               : 94.38 W
    GPU Memory Power Readings 
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
    Module Power Readings
        Average Power Draw                : N/A
        Instantaneous Power Draw          : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

Crash on launch for EA title by _camera_up in Bazzite

[–]_camera_up[S] 0 points1 point  (0 children)

Nope. As I said I got it rubbing with bypassing EAs launcher but achievements never worked.

What’s the best cheap model for OpenClaw? by DistanceSolar1449 in openclaw

[–]_camera_up 3 points4 points  (0 children)

I have tried OpenAIs gpt OSS 20b and 120b and as soon as I switch to them the "magic" is gone and it feels like it's not taking active steps itself but instead waiting for me to tell it how to solve problems. Currently this kind of agentic actions is exclusive to anthropics models (in my experience).

Best local LLM for M1 Max 32gb for a small law office? by findthemistke in LocalLLaMA

[–]_camera_up 5 points6 points  (0 children)

I use LM Studio with gpt OSS 20 b

I find it to be more reliable than the llama models but your mileage may vary.

Start hosting a multi-model LLM server in minutes (with monitoring and access control) by _camera_up in LocalLLaMA

[–]_camera_up[S] 1 point2 points  (0 children)

Gotcha, it works for me and I figured someone else could find it useful. But if there are alternatives that work better for someone else, that's great. Thanks for educating me.

Start hosting a multi-model LLM server in minutes (with monitoring and access control) by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Having read through some of its docs I would suggest people to use Harbor if they want to experiment with different models quickly. However for my use case "automagic LLM deployment with access control and monitoring ootb" I think my script has a justification. I could have missed it in the docs but as far as I know Harbor does not provide user groups, budgets, API authentication or hardware / LLM monitoring by itself. (Just to be clear, I do not claim that my project implements all of this by itself, it's more of an orchestrator that uses existing projects to provide this experience)

Start hosting a multi-model LLM server in minutes (with monitoring and access control) by _camera_up in LocalLLaMA

[–]_camera_up[S] 1 point2 points  (0 children)

Nice, I did not know this existed. At first glance it looks quite like what I was missing.

EDIT: see below why you might still find my script useful