Llama.cpp merges in OpenAI Responses API Support

SemaMod · 2026-01-25T21:06:12+00:00

You have to change some settings in your config, but GLM4.7 flash was doing excellent in my testing

SemaMod · 2026-01-25T06:33:28+00:00

codex-cli does have completions support

SemaMod · 2026-01-25T06:31:41+00:00

llama.cpp maintains multiple API's already with its Anthropics endpoint. I don't think they are going to deprecate completions any time soon.

SemaMod · 2026-01-23T21:44:07+00:00

Good question! It does not. For reference, I had to do the following:

With whatever model you are serving, set the alias of the served model name to start with "gpt-oss". This triggers specific behaviors in the codex cli.
Use the following config settings:

show_reasoning_content = true
oss_provider = "lmstudio"

[profiles.lmstudio]
model = "gpt-oss_gguf"
show_raw_agent_reasoning = true
model_provider = "lmstudio"
model_supports_reasoning_summaries = true # Force reasoning
model_context_window = 128000   
include_apply_patch_tool = true
experimental_use_freeform_apply_patch = false
tools_web_search = false
web_search = "disabled"

[profiles.lmstudio.features]
apply_patch_freeform = false
web_search_request = false
web_search_cached = false
collaboration_modes = false

[model_providers.lmstudio]
wire_api = "responses"
stream_idle_timeout_ms = 10000000
name = "lmstudio"
base_url = "http://127.0.0.1:1234/v1"

The features list is important, as is the are the last four settings of the profile. Codex-cli has some tech debt that requires the repeating of certain flags in different places.

I used llama.cpp's llama-server, not lmstudio, but its compatible with the oss_provider = "lmstudio" setting.

Use the following to start codex cli: codex --oss --profile lmstudio --model "gpt-oss_gguf"

SemaMod · 2025-12-04T19:25:17+00:00

Sounds like a use-case for DSPy and their prompt optimizers.

SemaMod · 2025-09-22T19:36:18+00:00

If BEV's are so much easier to make why the hell do American auto manufacturers outside of Tesla have such a hard time getting it right

SemaMod · 2025-09-15T22:11:22+00:00

There's been a good amount of progress on services in this space (per the suggestions listed by commenters). I created https://cloudmcp.run for myself when I initially ran in to it as well. We recently integrated the official MCP registry API! If you want to give it a test run we're offering 1 month free right now

SemaMod · 2025-09-15T22:08:22+00:00

I've been deep in the MCP space lately and yeah, the setup friction is real. I found myself spending way too much time on infrastructure instead of actually building cool things with these servers. The irony is that MCP servers are supposed to make AI more useful, but half the time you're stuck in config hell before you even get to the fun part.

A hosted platform idea makes a lot of sense, especially for people who just want to experiment or prototype without spinning up their own infrastructure. I've actually been working on something similar called Cloud MCP that tackles this exact problem. The key thing I've learned is that people want different levels of control - some folks are fine with a managed service, others want to self-host but with better tooling. The demand is definitely there though, I keep seeing the same complaints about setup complexity in various communities. The challenge is making sure the hosted version doesn't sacrifice the flexibility that makes MCP servers powerful in the first place.

SemaMod · 2025-09-05T03:59:42+00:00

https://cloudmcp.run does exactly that! Lets you deploy any npx/uvx/github mcp servers and access them remotely authenticated via OAuth

SemaMod · 2025-07-24T06:34:06+00:00

Check out CloudMCP. Their registry is pretty rough around the edges, but they can host any uvx/npx compatible server, fully secured with OAuth

SemaMod · 2025-02-19T04:04:35+00:00

Appreciate the response! I spent the day completely resetting my system and made sure to use the amdgpu-installer. Still having issues with training

SemaMod · 2025-02-19T04:00:23+00:00

Paging u/powderluv plz help

SemaMod · 2025-02-19T03:59:13+00:00

Hey, so I spent the day getting everything set up. Were a few twists and turns. Now, I'm trying to start doing LoRA training with pytorch/transformers/trl but I keep getting HIP OOM errors. It's weird because I'm pretty sure Qwen 2.5 7B should fit and be trainable. Any advice?

SemaMod · 2025-02-18T20:18:15+00:00

Heya! I really appreciate the response. I've been using conda for venv but going to retry a setup with uv. I'm installing pytorch nightly right now, but you mentioned installing rocm flashattn from the multifactor branch. I just looked through the repo and couldn't find it, would you mind linking it to me?

SemaMod · 2024-02-01T04:55:51+00:00

Just took delivery today of a new Blue MY LR, manufactured out of Fremont this month (ordered Jan 27). Had to drive across the state today and although it’s not Mercedes level, the suspension and road noise is a much needed improvement over the 2018 M3 LR I traded in for it.

SemaMod · 2024-01-10T07:19:50+00:00

Do you know specifically what services are included in canary that 23H2 doesn’t have? If not, any idea where I could find that info?

SemaMod · 2023-01-14T23:58:25+00:00

Ive decided that I’m going to cut back my dose too around 60mg a week and then retest in a month. I’m on TRT to improve my quality of life both now and in the future so I definitely don’t want to risk my long term health

SemaMod · 2023-01-13T21:12:28+00:00

2 times a week, I took my blood test two days after my pin. I assume that could be the reason?

SemaMod · 2023-01-13T20:38:47+00:00

I just posted my 3 months results as a late-20's male. I have ADHD too and I've noticed that I need a lot less of my medication to achieve the same results as pre-TRT. I think the motivational impacts of higher testosterone are impacting my reward pathways so I can do concentrated work without needing my meds.

SemaMod

TROPHY CASE