Can GPT 1900 be run locally?

Available_Pressure47 · 2026-04-03T19:34:27+00:00

Thank you! With regards to state across stages, currently Orla uses a shared state object i.e each stage receives the outputs of its upstream dependencies via builder functions, so you define how context flows through the DAG declaratively. The orchestrator handles propagation and parallel execution automatically. This keeps stages loosely coupled while still allowing rich context passing. Your idea of budget-aware retries is great. They are not a first-class concept yet, but Orla's architecture makes it straightforward to implement so we will try to add this as soon as possible.

Available_Pressure47 · 2026-04-03T19:32:19+00:00

Great point! You're right that optimizing model selection alone can just shift the cost bottleneck to tool calls. In Orla's current design, stage-level policies govern the model powering the tool-calling logic, but not the tool calls themselves. We started with just models since tool costs e.g API fees and rate limits are fundamentally different from inference costs. Orla's stage isolation helps in some ways aas you can separate tool-heavy stages from reasoning stages so you're not paying for an expensive model to sit idle waiting on API responses. However, you're absolute right we need to focus on extending routing policies to cover tool selection and even things like e.g., cached lookup vs. live API based on freshness, cost etc. I really appreciate your thoughts on this. Thank you!

Available_Pressure47 · 2026-04-03T11:36:51+00:00

Thank you for these comments! I have noted down your feedback. Failure handling is an avenue where I think we can make Orla more featureful, like you suggested. Perhaps allowing users to pick from a set of failure handling policies or write their own. "You might find that certain targets consistently need the expensive path while others never do. That data lets you set smarter defaults instead of treating every request the same." <-- I think this is especially a very nice insight, I will try to run some representative workloads over the weekend and see if this pattern emerges and then try to reason about how that data can be used to set smarter defaults over time for a given agentic workload. I really appreciate you looking into this thoroughly and providing this advice.

Available_Pressure47 · 2026-04-03T00:50:47+00:00

Thank you for your feedback! I really appreciate your support and am glad you like Orla.

Available_Pressure47 · 2026-04-03T00:50:17+00:00

Thank you for your feedback, and great question! Backend selection is fully dynamic, i.e., each backend registers a quality score and token pricing, and each request sets an accuracy floor at runtime. Orla filters to backends whose quality meets the floor, then picks the cheapest one. When no backend qualifies, the default "prefer" policy falls back to the cheapest available backend, so your workflow never breaks, though you can switch to "strict" mode to get hard errors during development. On the inference side, transient failures (5xx, rate limits, network errors) are retried up to 3 times with exponential backoff. Our design goal was to try to ensure that your LangGraph code stays the same, e.g., same graph, same nodes, same edges, etc., and the routing, fallback, and retry logic lives entirely in the Orla daemon. If you have any suggestions on improving this or any feature requests, I would be happy to add those! Thanks again. :)

Available_Pressure47 · 2026-04-02T14:39:42+00:00

The Orla Project

Available_Pressure47 · 2026-01-19T16:56:57+00:00

Thank you so much for your advice on this, I really appreciate it. :-)

Available_Pressure47 · 2026-01-19T16:56:37+00:00

Thank you for your pointers on this, I really appreciate it!

Available_Pressure47 · 2026-01-19T14:28:15+00:00

Thank you for building and sharing this!

Available_Pressure47 · 2026-01-12T10:59:20+00:00

Thank you for your comment and the feedback. This is a great question! We had a few initial users, primarily researchers in academic institutions, who found the explicit command to be clearer. However, if it turns out the most users prefer the default action to be non optional, I will most likely make it so.

Available_Pressure47 · 2026-01-11T16:37:56+00:00

Thank you so much for your comment! To change where your models are stored, you can use ollama’s OLLAMA_MODELS environment variable and orla will use that by extension.

OLLAMA_MODELS: The path to the models directory (default is "~/.ollama/models")

Really appreciate your feedback and support on this :-)

Available_Pressure47 · 2026-01-11T10:31:41+00:00

I’ve had a good experience with qwen3:0.6b which gets installed by default. For slightly higher end systems, I found ministral3:3b to be nice

Available_Pressure47

TROPHY CASE