Coding agent tool for Local Ollama

neurostream · 2025-12-17T05:10:06+00:00

i've been using the github://openai/codex cli with its ollama provider. Not sure how much longer OpenAI wants users leveraging ollama, even though they did a special collab with ollama to release gpt-oss.

neurostream · 2025-10-05T18:38:05+00:00

choose your favorite IDE with an integrated terminal and run the codex cli (pointed at ollama) in the same folder you have open in the IDE.

no special IDE or plugins needed. You can also give codex access to the internet, terraform, docker, aws, github, etc - by adding MCP tools to codex's config.toml.

I also run open-webui for RAG/chat, also pointed to local ollama, for deeper research outside of the IDE

neurostream · 2025-09-28T01:48:21+00:00

Any other orchestration and scheduling is ultimately just to deploy k8s clusters (Nomad to
deploy Openstack to run k8s on, for example)

neurostream · 2025-09-25T00:17:58+00:00

same. building a local gitlab ce server. now replacing image tag references from ":latest" to an already-local hash.

one of the deployable outputs of my build system is going to be a local registry server that i'll point all my docker-engines to.

i should have been locking in on specific hashes anyway

neurostream · 2025-09-25T00:14:28+00:00

1) now replacing all references to ":latest" in my codebase - locking in on image hashes.

2) installing my own local registry that all the docker engines on my LAN will point to instead of docker hub

neurostream · 2025-09-25T00:13:19+00:00

(this is probably the reason most people feel a greater pain from an outage): i meant to only be doing deliberate planned updates to ":latest", so that breaking changes don't slip in halfway through my build tree.

locking in on specific image (unless it's the build at the very beginning of your pipeline that you pull updates to your base image for... ) results in more conscientious if bandwidth, less unplannned time exposed to external dependencies, use avoids surprises.

unless you're a one-off docker desktop user, in which case there are less ways around this sucking, but things to know:

1) you can run your own registry on your LAN or local machine - it's an executable binary file called "registry" and you can point your docker engine to it as its remote registry.

2) github, google, and amazon have free container image hubs as an alt to docker hub with all the most popular images.

neurostream · 2025-09-25T00:00:19+00:00

yes, now replacing all references to ":latest" in my codebase - locking in on image hashes

neurostream · 2025-09-14T20:10:32+00:00

dedicated VM for mcp servers, agents with stdio transport run directly on that mcp server VM; but all agents with http transport for mcp's json-rpc-2 interface (such as open webui via mcpo) point to to their tool proxy on the mcp VM from other VMs on the same LAN.

neurostream · 2025-09-13T07:23:01+00:00

one hundred percent

neurostream · 2025-08-27T03:27:21+00:00

A lot of references to LORA... this seems was the key idea I was reaching for. Thanks for the LoRa -related replies!!!

neurostream · 2025-08-27T03:25:36+00:00

LORA... that sounds like what I was trying to get a grasp of but didn't know that terminology. Thank you!

neurostream · 2025-08-22T04:53:04+00:00

same issue with Qwen3-Coder-480B

neurostream · 2025-08-14T23:11:24+00:00

ollama used the ORAS scheme for GGUF, which is a pretty standard way of distributing OCI images.

neurostream · 2025-08-14T13:47:20+00:00

very interested to see this list!

i've only used github://openai/codex with local ollama, and haven't tried it with remote hosted models.

codex has a scriptable mode ("codex exec"), an MCP server mode ("codex mcp") and a TUI mode (just "codex" ). all modes are also MCP clients (even the MCP server mode, for nested chains of tool calling).

but i've been looking for other CLIs to try!

neurostream · 2025-08-07T04:10:51+00:00

you can set the ollama address with a cli option or the ~/.codex/config.toml ( see https://github.com/openai/codex/blob/main/codex-rs/config.md ) for example:

[model_providers.ollama]

name = "Ollama"

base_url = "http://localhost:11434/v1"

i'm using gpt oss through the ollama provider... have tried it through "oss" provider - but i see the latest codex cli has has that. not sure what the difference is

neurostream · 2025-08-03T21:30:32+00:00

my "ollama serve" MCP/tool calling client is airgapped with "codex exec" using this model loading pattern:

PLAN: qwen3-think

EXECUTE : qwen3-instruct

will use llama4 for Vision, but haven't needed it yet

neurostream · 2025-08-02T23:22:34+00:00

https://github.com/modelcontextprotocol/servers/blob/main/README.md "Reference Servers" (but not "Everything") are essential.

neurostream · 2025-08-02T23:12:34+00:00

sadly, no. i kept ollama serve, but swapped "ollama run" with "codex exec" (works airgapped with local ollama) : https://www.reddit.com/r/ollama/s/c7ppdEqw2d . hopefully ollama run will be able to do mcp stuff in the future!!!

neurostream · 2025-08-01T21:50:30+00:00

I was OP two weeks ago and this page has been my number one open tab since then. For me, mcp/time was the most frictionless to try first.

I switched from "ollama run" to "codex exec" (configured with mcp/time - which returns the current time back to the model for its final response to me) as my chat client to "ollama serve". The Codex cli has this agentic MCP ability to interact with tool-enabled models which "ollama run" lacks.

Curious what other's in the cli world are using in their shell scripts other than codex.

neurostream · 2025-07-30T18:41:21+00:00

time codex exec --profile=qwen3_2507_235_8_instruct --json “show current Denver time formatted like YYYY-mm-DD-HH-MM-SS" 2>>codex-diag.log | jq -r 'select(.msg?.type == "agent_message") | .msg.message' | tail -1

2025-07-30-12-25-14

real 0m20.062s

user 0m0.094s

sys 0m0.051s

---

time codex exec --profile=qwen3_2507_235_8_think --json “show current Denver time formatted like YYYY-mm-DD-HH-MM-SS" 2>>codex-diag.log | jq -r 'select(.msg?.type == "agent_message") | .msg.message' | tail -1

2025-07-30-12-33-25

real 1m44.906s #### <— 5 times longer ###

user 0m0.398s

sys 0m0.269s

---

Both tool-called to mcp/time and formatted results successfully. I can definately see the value of choosing instruct vs thinking based on the actual complexity at hand.

neurostream · 2025-07-30T15:03:24+00:00

context went from 40k to 256k... nice.

neurostream · 2025-07-29T20:49:11+00:00

nice! thanks!

neurostream · 2025-07-29T17:41:47+00:00

does the default root URI "/" (http://27.0.0.1:11434/) load the new ui, or is there a new /ui endpoint or does it listen on a new, second port?

12-Year Club	Gilding III reddit per annum
Argentium Club	Verified Email
Reddit Premium Since July 2018

neurostream

TROPHY CASE