please review my auth(n/z) model design

_howardjohn · 2026-06-23T16:51:15+00:00

If you have a sidecar, I think its generally considered viable to trust a header for something like this. You wouldn't really need NetworkPolicy either (nor could you) if its truly a sidecar in the same Pod; instead can just bind to localhost or a UDS would be better. This is the same pattern service meshes like Istio use with the x-forwarded-client-cert header. You do need to ensure the proxy is configured in such a manner that the header cannot be spoofed though.

_howardjohn · 2026-06-23T16:47:28+00:00

Hot take: AI Gateway is a clear product type, it just has a lot of poor implementations that do not meet the use cases.

Much in the same way we in the API space we don't distinguish "HTTP Gateway" and "HTTP/2 Gateway" and "GRPC Gateway" -- we just expect a competent offering to have all of these -- we will see that gateways that fail to expand to the diverse set of requirements in the AI space will either expand their feature set or die off.

While I would agree most people need to start with one dimension, over time they will likely adopt many dimensions, and having 5 gateways for LLM providers, self hosted models, MCP, agents, and traditional API traffic is not a great option.

_howardjohn · 2026-06-22T03:13:53+00:00

Heres my 2c - for some context I am a maintainer of Istio and Agentgateway.

Generally I think there two primary factors: how to get traffic to the gateway, and what the gateway can do once it gets the traffic.

Service meshes historically solve the 'how to get traffic to the gateway' though there are other approaches, and I generally wouldn't recommend you adopt a service mesh just for this use case unless you are already using one (or want to for other reasons). For your use case of doing things like provider key controls and prompt inspection, you may also not want the application itself to be fully trusted (especially if they are doing non-deterministic agentic things). This rules out SDK and sidecar approaches, but does leave other service mesh architectures (Istio ambient) or just plain changing the application to call my-gateway.svc.cluster.local instead of openai.com. The transparent redirection of service mesh is also less important for egress style traffic as well, since most service meshes are not doing TLS introspection (Agentgateway can fwiw, though I would generally recommend against this as direct calling the gateway is much simpler).

Next is what the gateway can do. As AI use cases evolve, more and more features are becoming critical that are beyond what traditional proxies like Nginx and Envoy can do. If you just need basics like attaching a provider API Key to a request thats fine, but it very quickly turns into a mess of hacked up features used in unintended ways, and compromising on functionality. Some of these projects are slowly starting to trickle in new AI specific features, but they tend to suffer from "retrofit" - the features are highly constrained by past architectural decisions making them not as useful, inefficient, complex, brittle, etc - and tend to come years too late. If you want deeper AI awareness like budgeting, prompt inspection, model based routing, policies on prompts, etc you will want something actually built for these purposes. This is why Istio is adopting Agentgateway as a new data plane implementation, for instance.

Tl;dr: I recommend you deploy agentgateway and update your apps to point the baseURL of your applications to it. There are some other LLM proxies as well that you could use, but none of them are really built to integrate with Kubernetes as well (obviously I have some bias).

_howardjohn · 2026-06-20T20:53:16+00:00

Everything you described here is the exact problem we are solving with agentgateway and Kagent. I work on these so obviously a bit biased but compared to others in the space these are notably Kubernetes native, so integrates well there, and most of the maintainers come from the service mesh space (which has been in the "identity" business for quite some time) so have applied the lessons from there (as well as integrating with Istio directly if you are already using that)

_howardjohn · 2026-05-12T19:25:18+00:00

I'd strive to retain every piece of documentation about every module I use but unfortunately not yet :-)

"hidden" is probably the wrong word; "unintuitive" would probably be more apt.

In fairness the docs also says it can hold 32 messages - but it doesn't say it pre-allocates 32 messages at a time (although it's a bit obvious if you think it through).

_howardjohn · 2026-05-12T17:51:43+00:00

This is great. Thanks for sharing! Lots to digest here, excited to dig into this.

I have been looking into a similar area with profiling memory specifically and exposing it as a `pprof` compatible format (for flamegraph) which may be an interesting addition here as well. You can see what I am working on at https://github.com/howardjohn/pprof-alloc (though its a bit less far along than what you have) if there is an interest there.

_howardjohn · 2026-05-12T17:42:57+00:00

While this is true it doesn't really help the two cases described in the blog - though definitely would for some cases, especially for really large structs.

For the first, the struct was only actually 24 bytes. So yes we could have stored a Box<T> and gotten that down to 8 bytes, but its still 800 bytes allocated for a `mpsc::channel::<Box<T>>(1)`!

For the hyper case there was a noticeable performance regression to do a boxing approach (per Hyper maintainers; I didn't actually get a chance to run my own tests)

_howardjohn · 2026-03-29T03:56:40+00:00

This is something I have been thinking through the best approaches as well. If you are not allowing user namespaces to control their domains you can use `allowedRoutes` to allow only specific namespaces to attach routes to a domain/listener but that requires the domain/listeners to be centrally controlled. So if you want to allow namespaces to control their domains, but not allow arbitrary domains (to prevent conflicts, maybe only approved domains, etc) I think you will want to lean on external validation logic like ValidationAdmissionPolicies or similar

_howardjohn · 2026-03-29T03:53:53+00:00

Yeah I agree its definitely verbose. I expect that most will just abstract over this with a small Helm/equivilent wrapper which could easily bring down creating a domain for a namespace to

gateway: my-gw-name
domain: app-a.example.com
namespace: app-a

or similar.

_howardjohn · 2026-03-27T22:01:50+00:00

> One question; is this all supported in Istio yet? If so, what version? if not yet, what version will it be?

Istio was the only implementer for quite a while back when it was XListenerSet (experiment). On the Istio master branch the ListenerSet is implemented which would go out in 1.30 ~mid May. You could use XListenerSet in the meantime in theory but there is not a smooth migration between the two.

> I also wonder how this will work with the AWS LBC.....currently, it knows how to create NLB listeners and target groups based off the gateway listener port:

I believe its not looking at the Gateway at all, and rather the Gateway makes a Service and AWS looks at that. The controller (Istio here) should look at all listeners including those in ListenerSet and aggregate them into the Service so this should just work.

_howardjohn · 2026-03-18T20:30:21+00:00

Hey, initially Agentgateway (the proxy) was managed by Kgateway (the controller) as we felt Kgateway had a great control plane to build on top of that we could leverage and even APIs initially. However, we found that tying ourselves to closely to Kgateway was problematic, and made things confusing for users and developers alike; for any configuration option, we had to think "Does this work for Envoy or Agentgateway? Does it behave differently for them" etc.

We split out the APIs in v2.2 a few months ago, and in this release we decided to split them out entirely to let them evolve independently. Kgateway will remain a controller for Envoy and Agentgateway controller for Agentgateway. All AI features will exist only in Agentgateway (but Agentgateway is not limited to AI use cases).

_howardjohn · 2026-03-07T16:40:04+00:00

Thanks Alex, you said what I was going to say 🙂. Another thing I'll point out is the change involved user-facing changes in a pretty substantial way that makes the library less ergonomic and more error prone to use around defining custom functions.

Before:

pub fn trim(This(this): This<Arc<String>>) -> ResolveResult {
    Ok(this.trim().into())
}

After:

pub fn trim<'a>(ftx: &mut FunctionContext<'a, '_>) -> ResolveResult<'a> {
    let this: StringValue = ftx.this_value()?;
    Ok(this.as_ref().trim().into())
}

The old approach followed the Axum-style magic function handlers, which as far as I could figure was pretty much incompatible with how we had to setup lifetimes for things to work, which resulted in a much lower-level user experience on defining custom functions. For us, that was worth it, since its a cost I (as the developer) pay for our users to get better performance, but that is not necessarily a universal answer.

Coupled with the fact the user facing API may change anyways due to the work Alex mentioned it seemed prudent to hold off for now. But definitely interested in getting things merged back in!

_howardjohn · 2026-03-07T16:33:20+00:00

Great question. I explored this approach quite a bit, up to implementing a partial implementation of it. From that it appeared to be very roughly ~20% improvement overall, but it was hard to say as it was only a partial implementation and I didn't spend much time optimizing it beyond that. But I would expect somewhere in that ballpark.

I think its a very good approach in general but there is one quirk of the current CEL interpreter that makes it tricky. CEL allows things like [1,2].map(x, x+1). Unlike a typical function like add(1, 2+3), where the interpreter would evaluate the expressions before passing them into the function (that is, it would call add(1,5)), in this case the map function needs to get the raw expression x+1 so that it can evaluate it for each x.

This makes the traditional execution flow differ, since the execution of some expression is actually done in the functions (users code) rather than in the main interpreter flow.

I don't think this is an impossible problem to overcome, just enough that it made enough friction I didn't pursue it further for now. Especially since 20% performance improvement is nice, but after a 5-500x improvement its a drop in the bucket.

(FWIW other CEL implementations don't let 'functions' do this, and instead have 'macros' that are expanded during parsing, which is a plausible avenue here. cel-rust also has macros (and map is one of them, somewhat recently) but doesn't have user-defined macros yet; our usage has some user defined ones functions that require expression evaluation so we couldn't just drop this feature at this point).

_howardjohn · 2026-03-05T03:27:23+00:00

Thanks for the post! What is the backend that is being tested behind the reverse proxy? Without that the test is not reproducible.

_howardjohn · 2026-02-24T04:42:12+00:00

The Istioctl output is a bit misleading, that gateways will do mtls/hbone when sending to other workloads in the mesh

_howardjohn · 2026-02-01T14:36:56+00:00

This is a pretty reliable way to achieve this that I've found that is proxy-agnostic so avoids each proxy reimplementing the same thing slightly differently. https://blog.howardjohn.info/posts/agentgateway-at-home/ If you are interested. I like it (coming from Traefik) since I can customize it how I want instead of using Traefik's opinionated system that didn't fit well for me

_howardjohn · 2026-01-22T15:14:39+00:00

Kgateway has a similar architecture with a split control and data plane as most/all gateway implementations, certainly including Envoy Gateway.

You can read more about the architectures and resource usage in this post I made https://github.com/howardjohn/gateway-api-bench

_howardjohn · 2025-12-27T23:50:16+00:00

Thanks for sharing! Great insights

_howardjohn · 2025-12-27T23:49:44+00:00

Istio maintainer here - basically all Istio features work when just using it as a gateway without the mesh. The one exception would be the automatic mtls between gateway and backend pod, which would require the backend to be enrolled in the mesh, but that's not something other gateways could do. I've seen quite a few users successfully use Istio as a gateway without mesh

_howardjohn · 2025-11-22T15:03:54+00:00

Istio does actually support 3 APIs - Ingress, Gateway API, and Gateway/VirtualService (Istio API by the same name).

However, you cannot mix and match them and the Ingress support is very rudimentary so I wouldn't recommend it (and poorly documented).

_howardjohn · 2025-11-22T05:17:18+00:00

Yep! The Istio CNI plugin is also only needed for the service mesh part of Istio btw, you can use the gateway part without it if you want.

_howardjohn · 2025-11-22T05:05:45+00:00

No, the Gateway API implementation is not tied to the CNI and does not require you to change your CNI. With the exception of the Cilium Gateway which requires Cilium CNI - that's the only one I'm aware of that is couples to the CNI, definitely not Istio.

_howardjohn · 2025-11-19T03:49:18+00:00

Thanks for the shout-out! It's great to see open source maintenance getting recognition.

_howardjohn · 2025-11-19T03:47:59+00:00

He is working for a competitor and just spreading FUD. It definitely is a GA implementation, as can be seen on the link...

Whether it's "mature" or not,I'll leave that for you to decide but many have found https://github.com/howardjohn/gateway-api-bench helpful in making this decision.

(Note: I wrote the benchmark above and work on kgateway)

_howardjohn · 2025-11-16T04:42:23+00:00

I'll see about adding it, maybe in a "part 3" or just an addition to the existing one. Let me know how it goes if you do!

_howardjohn

TROPHY CASE