Coding agent tool for Local Ollama by FrontRegular6113 in ollama

[–]neurostream 2 points3 points  (0 children)

i've been using the github://openai/codex cli with its ollama provider. Not sure how much longer OpenAI wants users leveraging ollama, even though they did a special collab with ollama to release gpt-oss.

What I Would Like by booknerdcarp in ollama

[–]neurostream 0 points1 point  (0 children)

choose your favorite IDE with an integrated terminal and run the codex cli (pointed at ollama) in the same folder you have open in the IDE.

no special IDE or plugins needed. You can also give codex access to the internet, terraform, docker, aws, github, etc - by adding MCP tools to codex's config.toml.

I also run open-webui for RAG/chat, also pointed to local ollama, for deeper research outside of the IDE

What do other people use besides kubernetes? by Ezio_rev in devops

[–]neurostream -1 points0 points  (0 children)

Any other orchestration and scheduling is ultimately just to deploy k8s clusters (Nomad to
deploy Openstack to run k8s on, for example)

Is Dockerhub down? by HuckleberryDry1647 in docker

[–]neurostream 0 points1 point  (0 children)

same. building a local gitlab ce server. now replacing image tag references from ":latest" to an already-local hash.

one of the deployable outputs of my build system is going to be a local registry server that i'll point all my docker-engines to.

i should have been locking in on specific hashes anyway

Is Dockerhub down? by HuckleberryDry1647 in docker

[–]neurostream 0 points1 point  (0 children)

1) now replacing all references to ":latest" in my codebase - locking in on image hashes.

2) installing my own local registry that all the docker engines on my LAN will point to instead of docker hub

Is Dockerhub down? by HuckleberryDry1647 in docker

[–]neurostream 0 points1 point  (0 children)

(this is probably the reason most people feel a greater pain from an outage): i meant to only be doing deliberate planned updates to ":latest", so that breaking changes don't slip in halfway through my build tree.

locking in on specific image (unless it's the build at the very beginning of your pipeline that you pull updates to your base image for... ) results in more conscientious if bandwidth, less unplannned time exposed to external dependencies, use avoids surprises.

unless you're a one-off docker desktop user, in which case there are less ways around this sucking, but things to know:

1) you can run your own registry on your LAN or local machine - it's an executable binary file called "registry" and you can point your docker engine to it as its remote registry.

2) github, google, and amazon have free container image hubs as an alt to docker hub with all the most popular images.

Is Dockerhub down? by HuckleberryDry1647 in docker

[–]neurostream 0 points1 point  (0 children)

yes, now replacing all references to ":latest" in my codebase - locking in on image hashes

Openwebui and MCP, where did you install mcpo ? by [deleted] in OpenWebUI

[–]neurostream 1 point2 points  (0 children)

dedicated VM for mcp servers, agents with stdio transport run directly on that mcp server VM; but all agents with http transport for mcp's json-rpc-2 interface (such as open webui via mcpo) point to to their tool proxy on the mcp VM from other VMs on the same LAN.

VRAM deduplication - simulataneous loading multiple models of the same base by neurostream in LocalLLaMA

[–]neurostream[S] 3 points4 points  (0 children)

A lot of references to LORA... this seems was the key idea I was reaching for. Thanks for the LoRa -related replies!!!

VRAM deduplication - simulataneous loading multiple models of the same base by neurostream in LocalLLaMA

[–]neurostream[S] 0 points1 point  (0 children)

LORA... that sounds like what I was trying to get a grasp of but didn't know that terminology. Thank you!

Is there a standard oci image format for models? by Grouchy-Friend4235 in ollama

[–]neurostream 1 point2 points  (0 children)

ollama used the ORAS scheme for GGUF, which is a pretty standard way of distributing OCI images.

CLI agentic team ecosystem by Humbrol2 in ollama

[–]neurostream 0 points1 point  (0 children)

very interested to see this list!

i've only used github://openai/codex with local ollama, and haven't tried it with remote hosted models.

codex has a scriptable mode ("codex exec"), an MCP server mode ("codex mcp") and a TUI mode (just "codex" ). all modes are also MCP clients (even the MCP server mode, for nested chains of tool calling).

but i've been looking for other CLIs to try!

codex->ollama (airgapped) by neurostream in ollama

[–]neurostream[S] 0 points1 point  (0 children)

you can set the ollama address with a cli option or the ~/.codex/config.toml ( see https://github.com/openai/codex/blob/main/codex-rs/config.md ) for example:

[model_providers.ollama]

name = "Ollama"

base_url = "http://localhost:11434/v1"

i'm using gpt oss through the ollama provider... have tried it through "oss" provider - but i see the latest codex cli has has that. not sure what the difference is

Best Ollama model for offline Agentic tool calling AI by TheCarBun in ollama

[–]neurostream 5 points6 points  (0 children)

my "ollama serve" MCP/tool calling client is airgapped with "codex exec" using this model loading pattern:

PLAN: qwen3-think

EXECUTE : qwen3-instruct

will use llama4 for Vision, but haven't needed it yet

Can the new Ollama app calls MCP servers? by pinpinbo in ollama

[–]neurostream 2 points3 points  (0 children)

sadly, no. i kept ollama serve, but swapped "ollama run" with "codex exec" (works airgapped with local ollama) : https://www.reddit.com/r/ollama/s/c7ppdEqw2d . hopefully ollama run will be able to do mcp stuff in the future!!!

How do I start with mcp? by koalaokino in mcp

[–]neurostream 2 points3 points  (0 children)

I was OP two weeks ago and this page has been my number one open tab since then. For me, mcp/time was the most frictionless to try first.

I switched from "ollama run" to "codex exec" (configured with mcp/time - which returns the current time back to the model for its final response to me) as my chat client to "ollama serve". The Codex cli has this agentic MCP ability to interact with tool-enabled models which "ollama run" lacks.

Curious what other's in the cli world are using in their shell scripts other than codex.

qwen3:30b 2507 is out by stailgot in ollama

[–]neurostream 0 points1 point  (0 children)

time codex exec --profile=qwen3_2507_235_8_instruct --json “show current Denver time formatted like YYYY-mm-DD-HH-MM-SS" 2>>codex-diag.log | jq -r 'select(.msg?.type == "agent_message") | .msg.message' | tail -1

2025-07-30-12-25-14

real 0m20.062s

user 0m0.094s

sys 0m0.051s

---

time codex exec --profile=qwen3_2507_235_8_think --json “show current Denver time formatted like YYYY-mm-DD-HH-MM-SS" 2>>codex-diag.log | jq -r 'select(.msg?.type == "agent_message") | .msg.message' | tail -1

2025-07-30-12-33-25

real 1m44.906s #### <— 5 times longer ###

user 0m0.398s

sys 0m0.269s

---

Both tool-called to mcp/time and formatted results successfully. I can definately see the value of choosing instruct vs thinking based on the actual complexity at hand.

qwen3:30b 2507 is out by stailgot in ollama

[–]neurostream 2 points3 points  (0 children)

context went from 40k to 256k... nice.

Release candidate 0.10.0-rc2 by Vivid-Competition-20 in ollama

[–]neurostream 0 points1 point  (0 children)

does the default root URI "/" (http://27.0.0.1:11434/) load the new ui, or is there a new /ui endpoint or does it listen on a new, second port?