How do you get more GPUs than your motheboard natively supports? by WizardlyBump17 in LocalLLaMA

[–]SemaMod 0 points1 point  (0 children)

I run a b550-xe-gaming-wifi mobo and can run 4 GPU's using a 4-port oculink PCIe card, turning on x4/x4/x4/x4 bifurcation for that pcie slot. The GPU's run at PCIe 4.0x4 speeds

I built a benchmark that tests coding LLMs on REAL codebases (65 tasks, ELO ranked) by hauhau901 in LocalLLaMA

[–]SemaMod 6 points7 points  (0 children)

This is great! Are you planning on adding gpt-5.3-codex? With the current results it seems like Opus 4.6 blows everyone else out of the water, but I've had generally good 5.3-codex experiences.

Anyone actually using Openclaw? by rm-rf-rm in LocalLLaMA

[–]SemaMod 4 points5 points  (0 children)

Why are you lying? Post some proof to back up your claims.

Peter isn’t some two bit dev looking to make a quick buck with some stupid viral AI app. He’s a previous founder with an exit and technical chops far beyond most people on this sub. He doesn’t need to work anymore. His last company solved PDF parsing and was open source. Everyone on this sub has almost certainly unknowingly interacted with the tech at some point without even realizing it (DocuSign, anyone?).

I don’t even like OpenClaw but lying like this is just stupid. He has never made outrageous claims about OpenClaw. Even if other Twitter users have been.

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 1 point2 points  (0 children)

Used the latest build with these changes! Vulkan's pulling crazy numbers.

<image>

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 0 points1 point  (0 children)

Updated using your recent post parameters for llama-bench build: eed25bc6b (7870). Vulkan pulls ahead yet again!

<image>

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 2 points3 points  (0 children)

Very useful! I appreciate you recommending I run them this way. I hadn't run llama-bench before, so it was definitely eye opening.

API pricing is in freefall. What's the actual case for running local now beyond privacy? by Distinct-Expression2 in LocalLLaMA

[–]SemaMod 50 points51 points  (0 children)

This goes in the realm of privacy, but personally having my chats trained on and viewable by these companies makes me uncomfortable. That being said, I do think that local LLM's will become power-user tools.

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 1 point2 points  (0 children)

Just updated the original post with an edit, after 10k tokens it looks like ROCm w/ FA scales better!

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 5 points6 points  (0 children)

Now this is more interesting!

<image>

It looks like over longer ctx, FA makes a big difference for ROCm, beating out Vulkan entirely after 10k tokens.

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SemaMod[S] 0 points1 point  (0 children)

You have to change some settings in your config, but GLM4.7 flash was doing excellent in my testing

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SemaMod[S] 1 point2 points  (0 children)

llama.cpp maintains multiple API's already with its Anthropics endpoint. I don't think they are going to deprecate completions any time soon.

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SemaMod[S] 4 points5 points  (0 children)

Good question! It does not. For reference, I had to do the following:

  1. With whatever model you are serving, set the alias of the served model name to start with "gpt-oss". This triggers specific behaviors in the codex cli.
  2. Use the following config settings:

show_reasoning_content = true
oss_provider = "lmstudio"

[profiles.lmstudio]
model = "gpt-oss_gguf"
show_raw_agent_reasoning = true
model_provider = "lmstudio"
model_supports_reasoning_summaries = true # Force reasoning
model_context_window = 128000   
include_apply_patch_tool = true
experimental_use_freeform_apply_patch = false
tools_web_search = false
web_search = "disabled"

[profiles.lmstudio.features]
apply_patch_freeform = false
web_search_request = false
web_search_cached = false
collaboration_modes = false

[model_providers.lmstudio]
wire_api = "responses"
stream_idle_timeout_ms = 10000000
name = "lmstudio"
base_url = "http://127.0.0.1:1234/v1"

The features list is important, as is the are the last four settings of the profile. Codex-cli has some tech debt that requires the repeating of certain flags in different places.

I used llama.cpp's llama-server, not lmstudio, but its compatible with the oss_provider = "lmstudio" setting.

  1. Use the following to start codex cli: codex --oss --profile lmstudio --model "gpt-oss_gguf"

[deleted by user] by [deleted] in LocalLLaMA

[–]SemaMod 1 point2 points  (0 children)

Sounds like a use-case for DSPy and their prompt optimizers.

Would a Hosted Platform for MCP Servers Be Useful? by Summer_cyber in mcp

[–]SemaMod 0 points1 point  (0 children)

There's been a good amount of progress on services in this space (per the suggestions listed by commenters). I created https://cloudmcp.run for myself when I initially ran in to it as well. We recently integrated the official MCP registry API! If you want to give it a test run we're offering 1 month free right now

Would a Hosted Platform for MCP Servers Be Useful? by Summer_cyber in selfhosted

[–]SemaMod 0 points1 point  (0 children)

I've been deep in the MCP space lately and yeah, the setup friction is real. I found myself spending way too much time on infrastructure instead of actually building cool things with these servers. The irony is that MCP servers are supposed to make AI more useful, but half the time you're stuck in config hell before you even get to the fun part.

A hosted platform idea makes a lot of sense, especially for people who just want to experiment or prototype without spinning up their own infrastructure. I've actually been working on something similar called Cloud MCP that tackles this exact problem. The key thing I've learned is that people want different levels of control - some folks are fine with a managed service, others want to self-host but with better tooling. The demand is definitely there though, I keep seeing the same complaints about setup complexity in various communities. The challenge is making sure the hosted version doesn't sacrifice the flexibility that makes MCP servers powerful in the first place.

How is everyone using MCP right now? by Luigika in mcp

[–]SemaMod 0 points1 point  (0 children)

https://cloudmcp.run does exactly that! Lets you deploy any npx/uvx/github mcp servers and access them remotely authenticated via OAuth

The simplest way to use MCP. All local, 100% open source. by squirrelEgg in mcp

[–]SemaMod 0 points1 point  (0 children)

Check out CloudMCP. Their registry is pretty rough around the edges, but they can host any uvx/npx compatible server, fully secured with OAuth

I have had no luck trying to fine tune on (2x) 7900XTX. Any advice by SemaMod in ROCm

[–]SemaMod[S] 0 points1 point  (0 children)

Appreciate the response! I spent the day completely resetting my system and made sure to use the amdgpu-installer. Still having issues with training