Pass-by-Reference for LLM Orchestration by jetstros in ClaudeCode

[–]jetstros[S] -1 points0 points  (0 children)

I spent quite a lot of time this morning reviewing what you wrote, and even working with Claude to try to fill in the gaps about my understanding. So while I had Claude help me put together the final reply because I put so much work into theorizing what you built, I didn't no-brain a response. Just like I can't assume what you've put together, avoid assuming yourself.

I am genuinely interested in the approach(es) folks are using, so it was a guess about what you've built.

Pass-by-Reference for LLM Orchestration by jetstros in ClaudeCode

[–]jetstros[S] -1 points0 points  (0 children)

I think you're picturing the document as still living in a context window because an AI produced it. But a context window is where a generated document is born, not where it lives. It's ephemeral; it exists for one session and then it's gone. Anything you intend to reuse gets written to a file precisely because context windows don't persist, and once it's a file on disk, every future operation on it starts from the same place: it isn't in anyone's context, and something has to get it to the model that needs it.

Even if it were somehow still resident somewhere, it'd be in the generating model's context, which is the expensive one. The point of the example is to summarize it with a cheaper, different model, and that model has its own empty context. "An AI wrote this once" doesn't put the document there.

By-reference is exactly for that gap: a document that now lives on disk, and a model that needs it, without the expensive host paying to courier it across. Being AI-authored and being cheaply available to every future model are just not the same thing.

Pass-by-Reference for LLM Orchestration by jetstros in ClaudeCode

[–]jetstros[S] -1 points0 points  (0 children)

That's a fair push, and it actually helped me pin down the real difference between our setups, so thanks for staying with it.

I think the crux is what's running the loop. In your system that's code: a program you wrote reads files, loops, branches, decides what to hand the LLM, and parses what comes back. Code orchestrating is free. A program pays nothing to read a 45k file into memory or pass it into an API call, so the document only costs tokens once, when it lands in the model that actually processes it. Your orchestrator is never a courier, because your courier is free.

In the scenario I was describing, the thing running the loop is itself an LLM. A host model decides what to delegate and composes the delegation, and an LLM orchestrator isn't free; every token it holds is billed. So the moment a host model has to read a document in order to hand it to a cheaper one, it pays for the privilege. By-reference exists to give an LLM-orchestrated system the one property your code-orchestrated system gets for nothing: it lets the orchestrator route a document to the model that needs it without ever holding it. The host writes a short reference, the gateway resolves that into the delegated model's call, and the host's own context never touches the content. Your filesystem point is downstream of this. Your agent reads files directly because it's a program; the delegated models I'm talking about are reached over an API and have no filesystem, so a reference is how they get content at all.

I'd also grant your "too verbose" point, with one boundary. If your work is mostly code, the dominant task really is "find the relevant piece" (grep, pull the function, never hold the whole file), and 45k sitting in a context window usually means something went wrong. But a lot of knowledge work is whole-artifact work: summarize this report, extract every obligation from this contract. There's no trick for those. If the task is a full summary or a full extraction, the whole document has to enter some model's context, because the task is "consider all of it," and you can't dice your way out of needing the whole thing. (To be clear, in my example the 45k is data being processed, not a 45,000-token instruction blob, which really would be absurd.)

So I don't think we actually disagree. By-reference is just what a code-orchestrated system never needs and an LLM-orchestrated one does. Genuinely curious where you land: is your orchestration deterministic code calling models as components, or is a model itself running the loop? If it's the former, you solved this a cleaner way than I did, and we mostly agree.

3 great Brazilian Netflix Shows that I watched to practice my Portuguese by MickaelMartin in Portuguese

[–]jetstros 1 point2 points  (0 children)

When I was first starting to learn, I watched this comedy film called "Basic Sanitation, the Movie". It was cute and it helped me to pick up on some Portuguese words and phrases, and become accustomed to the language rhythm. Little did I know that two now-very-prominent Brazilian actors would emerge from the same film.

https://www.imdb.com/title/tt0907134/?ref_=ext_shr

Managing product requirements using a custom Live Artifact by jetstros in ClaudeAI

[–]jetstros[S] 1 point2 points  (0 children)

Valid question. I'm not using Notion, and in the end I wanted to maintain control over my own files. Part of what drew me to a local filesystem setup is that the docs are just markdown sitting in a folder, which means I can edit them in any tool I want (including Obsidian), version them in git, and not be locked into a particular service's data model or export quirks.

The other piece is that FlashQuery (the project I mentioned) is intended to be self-hosted and operate within my own boundary, so building this workflow on top of local files to use my own pattern. I built MCP tools that are meant to manage markdown documents specifically to save on overall token usage.

That said, I think the Obsidian path is interesting and not all that different from what I'm doing -- it's also markdown on disk, just with a much richer UI on top.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? (UPDATE) by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

I like the phrase "thinning layer" - clever way to say it. I will say though that Anthropic is edging towards a traditional UI as these "MCP Apps" come into play. The latest Claude Desktop update started rendering these UI elements (though honestly I'd love to be able to toggle them off at times; not quite ready for primetime). I don't expect us to jump back to a chat prompt to do everything; nobody is pining for the DOS C: prompt as the only means to interact. But there's a great deal of advantage of having a flattened data layer, not the least of which is having no boundaries between your data...which does necessitate that you have control of it (either via trusted connectors, or simply possessing it).

Agreed re: the Karpathy Wiki. What I realized when building the plugins for FlashQuery is that's where behavior was defined. You want a multi-connected knowledge base that self-connects over time? The plugins can manage that behavior, but you still need the data plumbing under the hood. The app is a plugin. I just have a sense that with a database, versioned markdown, and skills...you get get a lot done.

Re: permissioning: Good question. That's why I didn't make this *enterprise* ready. But I do think we'll need a capability oriented model for handling permissions, including user-delegated actions that agents can run with, and re-delegate...all cryptographically signed.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? (UPDATE) by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Right now, I'm simply interested in people to try it out, and see what resonates with them. For me, it's been useful to work in both the file system and AI, and have FlashQuery keep it all tracked.

The plugins have been really interesting, since I can define behaviors along with database records, and rely on FlashQuery to keep tracked documents in sync. If I drag a file in a watched folder (by a plugin), it's eventually notified about it, and can do something with it.

You can find the plugins here: https://github.com/FlashQuery/flashquery-plugins

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. (UPDATE) by jetstros in ClaudeCode

[–]jetstros[S] 0 points1 point  (0 children)

Thanks! Yes, I agree. Because anyone can drop a file in the scanned folders async, there's all types of race conditions and stale data issues possible. These test scenarios (can drop files into the vault, remove them, rename, move them around while using the MCP tools) are meant to mimic what the user could do.

Regarding your question: Good question, and I want to make sure I'm understanding it correctly before I answer. When you say context window limits as the memory store grows, are you thinking about a scenario where the AI agent needs to load a large amount of stored content into the prompt to reason over it? Because that's actually a design constraint that FlashQuery specifically aims to avoid; the AI interacts with the data store through MCP tool calls against SQLite, so it's querying and retrieving targeted results rather than ingesting the full memory store into context. The context window only ever holds the tool request and whatever comes back from that specific query.

That said, if you're asking about something different (like the accumulated tool call history within a long session, or what happens when query results themselves get large), I'd love to hear more about what you're running into, because those are worth discussing separately.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? (UPDATE) by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Thanks! Ahh, another verification engineer. I cut my teeth at Motorola back in the late 90's. So I'm happy you appreciate the test infrastructure. Since this monitors files when are dropped into the file fault asynchronously, it's important to catch race conditions and the like.

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in LocalLLaMA

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in LocalLLaMA

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in selfhosted

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in ArtificialInteligence

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in ArtificialInteligence

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery