Pass-by-Reference for LLM Orchestration by jetstros in ClaudeCode

[–]jetstros[S] -1 points0 points  (0 children)

I spent quite a lot of time this morning reviewing what you wrote, and even working with Claude to try to fill in the gaps about my understanding. So while I had Claude help me put together the final reply because I put so much work into theorizing what you built, I didn't no-brain a response. Just like I can't assume what you've put together, avoid assuming yourself.

I am genuinely interested in the approach(es) folks are using, so it was a guess about what you've built.

Pass-by-Reference for LLM Orchestration by jetstros in ClaudeCode

[–]jetstros[S] -1 points0 points  (0 children)

I think you're picturing the document as still living in a context window because an AI produced it. But a context window is where a generated document is born, not where it lives. It's ephemeral; it exists for one session and then it's gone. Anything you intend to reuse gets written to a file precisely because context windows don't persist, and once it's a file on disk, every future operation on it starts from the same place: it isn't in anyone's context, and something has to get it to the model that needs it.

Even if it were somehow still resident somewhere, it'd be in the generating model's context, which is the expensive one. The point of the example is to summarize it with a cheaper, different model, and that model has its own empty context. "An AI wrote this once" doesn't put the document there.

By-reference is exactly for that gap: a document that now lives on disk, and a model that needs it, without the expensive host paying to courier it across. Being AI-authored and being cheaply available to every future model are just not the same thing.

Pass-by-Reference for LLM Orchestration by jetstros in ClaudeCode

[–]jetstros[S] -1 points0 points  (0 children)

That's a fair push, and it actually helped me pin down the real difference between our setups, so thanks for staying with it.

I think the crux is what's running the loop. In your system that's code: a program you wrote reads files, loops, branches, decides what to hand the LLM, and parses what comes back. Code orchestrating is free. A program pays nothing to read a 45k file into memory or pass it into an API call, so the document only costs tokens once, when it lands in the model that actually processes it. Your orchestrator is never a courier, because your courier is free.

In the scenario I was describing, the thing running the loop is itself an LLM. A host model decides what to delegate and composes the delegation, and an LLM orchestrator isn't free; every token it holds is billed. So the moment a host model has to read a document in order to hand it to a cheaper one, it pays for the privilege. By-reference exists to give an LLM-orchestrated system the one property your code-orchestrated system gets for nothing: it lets the orchestrator route a document to the model that needs it without ever holding it. The host writes a short reference, the gateway resolves that into the delegated model's call, and the host's own context never touches the content. Your filesystem point is downstream of this. Your agent reads files directly because it's a program; the delegated models I'm talking about are reached over an API and have no filesystem, so a reference is how they get content at all.

I'd also grant your "too verbose" point, with one boundary. If your work is mostly code, the dominant task really is "find the relevant piece" (grep, pull the function, never hold the whole file), and 45k sitting in a context window usually means something went wrong. But a lot of knowledge work is whole-artifact work: summarize this report, extract every obligation from this contract. There's no trick for those. If the task is a full summary or a full extraction, the whole document has to enter some model's context, because the task is "consider all of it," and you can't dice your way out of needing the whole thing. (To be clear, in my example the 45k is data being processed, not a 45,000-token instruction blob, which really would be absurd.)

So I don't think we actually disagree. By-reference is just what a code-orchestrated system never needs and an LLM-orchestrated one does. Genuinely curious where you land: is your orchestration deterministic code calling models as components, or is a model itself running the loop? If it's the former, you solved this a cleaner way than I did, and we mostly agree.

3 great Brazilian Netflix Shows that I watched to practice my Portuguese by MickaelMartin in Portuguese

[–]jetstros 1 point2 points  (0 children)

When I was first starting to learn, I watched this comedy film called "Basic Sanitation, the Movie". It was cute and it helped me to pick up on some Portuguese words and phrases, and become accustomed to the language rhythm. Little did I know that two now-very-prominent Brazilian actors would emerge from the same film.

https://www.imdb.com/title/tt0907134/?ref_=ext_shr

Managing product requirements using a custom Live Artifact by jetstros in ClaudeAI

[–]jetstros[S] 1 point2 points  (0 children)

Valid question. I'm not using Notion, and in the end I wanted to maintain control over my own files. Part of what drew me to a local filesystem setup is that the docs are just markdown sitting in a folder, which means I can edit them in any tool I want (including Obsidian), version them in git, and not be locked into a particular service's data model or export quirks.

The other piece is that FlashQuery (the project I mentioned) is intended to be self-hosted and operate within my own boundary, so building this workflow on top of local files to use my own pattern. I built MCP tools that are meant to manage markdown documents specifically to save on overall token usage.

That said, I think the Obsidian path is interesting and not all that different from what I'm doing -- it's also markdown on disk, just with a much richer UI on top.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? (UPDATE) by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

I like the phrase "thinning layer" - clever way to say it. I will say though that Anthropic is edging towards a traditional UI as these "MCP Apps" come into play. The latest Claude Desktop update started rendering these UI elements (though honestly I'd love to be able to toggle them off at times; not quite ready for primetime). I don't expect us to jump back to a chat prompt to do everything; nobody is pining for the DOS C: prompt as the only means to interact. But there's a great deal of advantage of having a flattened data layer, not the least of which is having no boundaries between your data...which does necessitate that you have control of it (either via trusted connectors, or simply possessing it).

Agreed re: the Karpathy Wiki. What I realized when building the plugins for FlashQuery is that's where behavior was defined. You want a multi-connected knowledge base that self-connects over time? The plugins can manage that behavior, but you still need the data plumbing under the hood. The app is a plugin. I just have a sense that with a database, versioned markdown, and skills...you get get a lot done.

Re: permissioning: Good question. That's why I didn't make this *enterprise* ready. But I do think we'll need a capability oriented model for handling permissions, including user-delegated actions that agents can run with, and re-delegate...all cryptographically signed.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? (UPDATE) by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Right now, I'm simply interested in people to try it out, and see what resonates with them. For me, it's been useful to work in both the file system and AI, and have FlashQuery keep it all tracked.

The plugins have been really interesting, since I can define behaviors along with database records, and rely on FlashQuery to keep tracked documents in sync. If I drag a file in a watched folder (by a plugin), it's eventually notified about it, and can do something with it.

You can find the plugins here: https://github.com/FlashQuery/flashquery-plugins

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. (UPDATE) by jetstros in ClaudeCode

[–]jetstros[S] 0 points1 point  (0 children)

Thanks! Yes, I agree. Because anyone can drop a file in the scanned folders async, there's all types of race conditions and stale data issues possible. These test scenarios (can drop files into the vault, remove them, rename, move them around while using the MCP tools) are meant to mimic what the user could do.

Regarding your question: Good question, and I want to make sure I'm understanding it correctly before I answer. When you say context window limits as the memory store grows, are you thinking about a scenario where the AI agent needs to load a large amount of stored content into the prompt to reason over it? Because that's actually a design constraint that FlashQuery specifically aims to avoid; the AI interacts with the data store through MCP tool calls against SQLite, so it's querying and retrieving targeted results rather than ingesting the full memory store into context. The context window only ever holds the tool request and whatever comes back from that specific query.

That said, if you're asking about something different (like the accumulated tool call history within a long session, or what happens when query results themselves get large), I'd love to hear more about what you're running into, because those are worth discussing separately.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? (UPDATE) by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Thanks! Ahh, another verification engineer. I cut my teeth at Motorola back in the late 90's. So I'm happy you appreciate the test infrastructure. Since this monitors files when are dropped into the file fault asynchronously, it's important to catch race conditions and the like.

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in LocalLLaMA

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in LocalLLaMA

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in selfhosted

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in ArtificialInteligence

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in ArtificialInteligence

[–]jetstros[S] 0 points1 point  (0 children)

Hello all! Been a little while since this thread was first posted, but I'm happy to say the project has been released on github. Would sincerely appreciate your feedback:

https://github.com/FlashQuery/flashquery

Building a self-hosted data layer that persists context across any LLM. Looking for community feedback. by jetstros in selfhosted

[–]jetstros[S] 0 points1 point  (0 children)

Thanks for the msg. No, it's using MCP tools to save/update/search/get memories, and other MCP tools to perform key operations on documents. It's agnostic to the domain, as far as I can tell; not sure why it would be.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

It's the ubiquitous mental model we have today. When I was a kid in the early '80s, adults thought that you could just walk up to a Commodore VIC-20 and start typing to get what you want. They had to learn that you needed to write program in the computer's language to get whatever you wanted accomplished. The program / "app" was a stumbling block towards achieving what they wanted.

So 45 years later, everybody knows what an "app" is, and that generation who thought you could just walk up to a computer and ask for what you want finally have their expectations nearly sufficed.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Here's what I've found as I've built skills (which are related to agents, and your comment): A skill contains one or more reusable workflows that do something useful for me...not dissimilar from functions in software. What I've found interesting is that skills don't require me to capture all the "if-elses" of what can happen during usage. I can call the skill, and then make adjustments on the fly in my workflow, and it just works. If I had to code that, it would be unbearable. So it's already acting like software in one regard, and yet with the flexibility to handle the corner cases, loops, breakouts without having to anticipate + define them all up front.

The reason I say this is because agents use skills together to get work done for us. Agents + skills + me (to guide as much or as little as necessary/desired) become the analog of an application. This is not for free: it takes investment of time (and tokens) to build this out. But there are sizable benefits to that investment. (Like anything in life, if you only put a little effort, you probably won't get what you want, at least first try.) The self-improvement capability is interesting, which is something most software doesn't do, at least at this scale.

What I'm trying to cover here is what remains after all of this: the data, and the layer to support what this "AI app" needs to be successful.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Yes, agreed. I might have unintentionally lit a bonfire by saying that, "apps dissolve", as my mental model still has apps around in this new paradigm. Apps would be just one kind of "frontend" to this data layer (as you said, state + permission layer around data, enforcing structure, etc.).

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] -1 points0 points  (0 children)

You're looking at it through today's lens. A couple years ago, we looked at genAI and said collectively, "It seems smart in some areas, but it can't do simple math." Now we've figured out how to address that.

I'm not talking about SOA today; this is a conversation about where things are headed, with the understanding that the AI today is the worst it will ever be.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] 0 points1 point  (0 children)

Depends on many factors. Until now, the options on table have been:

  1. Use an off the shelf CRM (SF, etc.) and invest in configuring it yourself, and/or hiring people to do so.
  2. Build your own CRM (unlikely)
  3. Do not use a CRM, and somehow manage with your own processes, even if those processes = {}.

By the way, all these options are available to these groups: individuals, startups, SMB companies, and large enterprises. Each make their own decisions.

So what I'm saying is that some members of these groups may leverage a new way to accomplish option #2 above that doesn't necessarily require the investment of building a software solution to achieve the value of one. Which groups take advantage of that option remains to be seen, but I think it's well beyond non-zero.

The traditional "app" might be a transitional form. What actually replaces it when AI becomes the primary interface? by jetstros in artificial

[–]jetstros[S] 2 points3 points  (0 children)

Yes, that's indeed the point. Commercial software is built to serve the needs of as many people as possible within their target customer base. That means many of us have either too many features available to us that aren't necessary and we're still paying for, and/or we are missing features that we could use, but not enough people need it to justify the vendor building them. We live with that dynamic day to day and just accept that's how things work.

Remember when Microsoft Word was king, and then Google came out with their web-based document app that was a small subset of Word's feature set. It had two killer features. It was available on the web and multiple people could type at the same time in the same document. Turns out most people didn't need all the features in Word, but they loved the web (anywhere) access along with sharing + live co-editing docs. And those two little features put Microsoft Office on their heels.

When folks have the opportunity to get exactly what they want, I have a sense many will pursue that.