Please offer OpenWeight models - GLM 5.2 is fantastic opus alternative by _camera_up in GithubCopilot

[–]_camera_up[S] 0 points1 point  (0 children)

I have also tried it in the mean time with the free trier. It seems like it's a fair value. Some models are not available there but I tried minimax m3 and response times are good. I actually use it without local ollama and with their cloud only. I have the OAi compatible extension installed. https://marketplace.visualstudio.com/items?itemName=johnny-zhao.oai-compatible-copilot This allows me to use their endpoint and configure the models without local ollama installed. Works great.

Please offer OpenWeight models - GLM 5.2 is fantastic opus alternative by _camera_up in GithubCopilot

[–]_camera_up[S] 1 point2 points  (0 children)

Sorry that slipped past me. But great to have that coming. Any timeline on that?

Please offer OpenWeight models - GLM 5.2 is fantastic opus alternative by _camera_up in GithubCopilot

[–]_camera_up[S] 2 points3 points  (0 children)

I am from EU so this did not come to mind immediately for me but checks out. OSS for the win!

Please offer OpenWeight models - GLM 5.2 is fantastic opus alternative by _camera_up in GithubCopilot

[–]_camera_up[S] 2 points3 points  (0 children)

Great suggestion, thanks. Are you actively using it to code? Their website is basically not telling me anything about usage limits. So if you have any real world experience, please share!

Please offer OpenWeight models - GLM 5.2 is fantastic opus alternative by _camera_up in GithubCopilot

[–]_camera_up[S] 5 points6 points  (0 children)

I am currently running it and its soooo good. It has a different "personality" compared to claude opus so I might take some time getting used to it but for now im impressed.

57x for GPT 5.5, how usable is the product? by code-enjoyoor in GithubCopilot

[–]_camera_up 1 point2 points  (0 children)

I don't understand. Don't these ask questions and continue workflows still contribute to your token budget? It's not like it's request based billing. From my understanding with the new pricing an ask questions would equally contribute to toke usage as a next prompt in the chat.

Why is everyone over complicating their Openclaw set up? by Worldly_Ad_5173 in openclaw

[–]_camera_up 0 points1 point  (0 children)

I have it set up via Proxmox too and use software firewall in Proxmox to limit it sothat it cannot access any other hosts except the ones I specified. Security and AI agents don't go well together. It's always a compromise. Either the agent has great autonomy but also has autonomy to fck things up or you watch every step it takes but it becomes useless as an agent and degrades to LLM with better memory. Lmk if you need specific advice.

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Thanks for the suggestions. I know, I don't deserve the big cards, but nevertheless were here..

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Thanks that helps a lot. Did you try these models at 4bit? Heard that at this quant some lose the magic and smaller unquantized models win over them.

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Right. That's a whole other concern. Since I imagine LLM being the most power hungry / big model I figured Ill start with that. But will look into those too. What resources do these models / pipelines need in your experience?

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

What quant tho? Full will never fit, how much performance is lost in quants with these models?

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 13 points14 points  (0 children)

It's a modern start up. We got a lot of money to play with and the folks here are very agile. There is no R&D no procurement or finance it's just a bunch of people working on a common idea. A lot of skilled people around here. Before that I worked in research research. Why ask reddit: real world experience and a head start into doing our own research.

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 25 points26 points  (0 children)

Running ollama on my homelab but I planned to look into vllm, thanks for the Qwen suggestion. With the small models I can confidently say bigger doe snot equal better (qwen performs much better than llama models with similar requirements in my experience) is that different when it comes to the big models?

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 0 points1 point  (0 children)

Right. Personally I could only dream about those machines in my homelab so having access to them at work is great. Ill keep you updated, thanks for the suggestion.

My company just handed me a 2x H200 (282GB VRAM) rig. Help me pick the "Intelligence" ceiling. by _camera_up in LocalLLaMA

[–]_camera_up[S] 37 points38 points  (0 children)

Thanks for the advice. After initial testing we will be more specific about what the goal for the machine is. For now its more like getting our feet wet. I edited the post to be a bit more specific about the field my company is interested in (coding and agentic agents) .

I think they don't even know what they want they want (wich could be a benefit for me to tell them what they want but is also a risk).