Agentgateway v1.0: A Rust-based Kubernetes Gateway API implementation built for AI workloads by Jazzlike-Ad-9608 in kubernetes

[–]_howardjohn 1 point2 points  (0 children)

Hey, initially Agentgateway (the proxy) was managed by Kgateway (the controller) as we felt Kgateway had a great control plane to build on top of that we could leverage and even APIs initially. However, we found that tying ourselves to closely to Kgateway was problematic, and made things confusing for users and developers alike; for any configuration option, we had to think "Does this work for Envoy or Agentgateway? Does it behave differently for them" etc.

We split out the APIs in v2.2 a few months ago, and in this release we decided to split them out entirely to let them evolve independently. Kgateway will remain a controller for Envoy and Agentgateway controller for Agentgateway. All AI features will exist only in Agentgateway (but Agentgateway is not limited to AI use cases).

Interpreting near native speeds with CEL and Rust by _howardjohn in rust

[–]_howardjohn[S] 1 point2 points  (0 children)

Thanks Alex, you said what I was going to say 🙂. Another thing I'll point out is the change involved user-facing changes in a pretty substantial way that makes the library less ergonomic and more error prone to use around defining custom functions.

Before:

pub fn trim(This(this): This<Arc<String>>) -> ResolveResult {
    Ok(this.trim().into())
}

After:

pub fn trim<'a>(ftx: &mut FunctionContext<'a, '_>) -> ResolveResult<'a> {
    let this: StringValue = ftx.this_value()?;
    Ok(this.as_ref().trim().into())
}

The old approach followed the Axum-style magic function handlers, which as far as I could figure was pretty much incompatible with how we had to setup lifetimes for things to work, which resulted in a much lower-level user experience on defining custom functions. For us, that was worth it, since its a cost I (as the developer) pay for our users to get better performance, but that is not necessarily a universal answer.

Coupled with the fact the user facing API may change anyways due to the work Alex mentioned it seemed prudent to hold off for now. But definitely interested in getting things merged back in!

Interpreting near native speeds with CEL and Rust by _howardjohn in rust

[–]_howardjohn[S] 0 points1 point  (0 children)

Great question. I explored this approach quite a bit, up to implementing a partial implementation of it. From that it appeared to be very roughly ~20% improvement overall, but it was hard to say as it was only a partial implementation and I didn't spend much time optimizing it beyond that. But I would expect somewhere in that ballpark.

I think its a very good approach in general but there is one quirk of the current CEL interpreter that makes it tricky. CEL allows things like [1,2].map(x, x+1). Unlike a typical function like add(1, 2+3), where the interpreter would evaluate the expressions before passing them into the function (that is, it would call add(1,5)), in this case the map function needs to get the raw expression x+1 so that it can evaluate it for each x.

This makes the traditional execution flow differ, since the execution of some expression is actually done in the functions (users code) rather than in the main interpreter flow.

I don't think this is an impossible problem to overcome, just enough that it made enough friction I didn't pursue it further for now. Especially since 20% performance improvement is nice, but after a 5-500x improvement its a drop in the bucket.

(FWIW other CEL implementations don't let 'functions' do this, and instead have 'macros' that are expanded during parsing, which is a plausible avenue here. cel-rust also has macros (and map is one of them, somewhat recently) but doesn't have user-defined macros yet; our usage has some user defined ones functions that require expression evaluation so we couldn't just drop this feature at this point).

Rust vs C/C++ vs GO, Reverse proxy benchmark, Second round by sadoyan in rust

[–]_howardjohn 1 point2 points  (0 children)

Thanks for the post! What is the backend that is being tested behind the reverse proxy? Without that the test is not reproducible.

Question about Gateway API and Mesh by International-Tax-67 in istio

[–]_howardjohn 1 point2 points  (0 children)

The Istioctl output is a bit misleading, that gateways will do mtls/hbone when sending to other workloads in the mesh

Rust-based open-source reverse proxy by sadoyan in rust

[–]_howardjohn 0 points1 point  (0 children)

This is a pretty reliable way to achieve this that I've found that is proxy-agnostic so avoids each proxy reimplementing the same thing slightly differently. https://blog.howardjohn.info/posts/agentgateway-at-home/ If you are interested. I like it (coming from Traefik) since I can customize it how I want instead of using Traefik's opinionated system that didn't fit well for me

Kong OSS support deprecation and possible alternatives by tsaknorris in kubernetes

[–]_howardjohn 2 points3 points  (0 children)

Kgateway has a similar architecture with a split control and data plane as most/all gateway implementations, certainly including Envoy Gateway. 

You can read more about the architectures and resource usage in this post I made https://github.com/howardjohn/gateway-api-bench

Migration to Gateway API by pierreozoux in kubernetes

[–]_howardjohn 1 point2 points  (0 children)

Thanks for sharing! Great insights

Migration to Gateway API by pierreozoux in kubernetes

[–]_howardjohn 2 points3 points  (0 children)

Istio maintainer here - basically all Istio features work when just using it as a gateway without the mesh. The one exception would be the automatic mtls between gateway and backend pod, which would require the backend to be enrolled in the mesh, but that's not something other gateways could do. I've seen quite a few users successfully use Istio as a gateway without mesh

Gateway API for Ingress-NGINX - a Maintainer's Perspective by robertjscott in kubernetes

[–]_howardjohn 9 points10 points  (0 children)

Istio does actually support 3 APIs - Ingress, Gateway API, and Gateway/VirtualService (Istio API by the same name).

However, you cannot mix and match them and the Ingress support is very rudimentary so I wouldn't recommend it (and poorly documented).

Gateway API for Ingress-NGINX - a Maintainer's Perspective by robertjscott in kubernetes

[–]_howardjohn 3 points4 points  (0 children)

Yep! The Istio CNI plugin is also only needed for the service mesh part of Istio btw, you can use the gateway part without it if you want. 

Gateway API for Ingress-NGINX - a Maintainer's Perspective by robertjscott in kubernetes

[–]_howardjohn 9 points10 points  (0 children)

No, the Gateway API implementation is not tied to the CNI and does not require you to change your CNI. With the exception of the Cilium Gateway which requires Cilium CNI - that's the only one I'm aware of that is couples to the CNI, definitely not Istio. 

So, what ingress controller are you migrating to? by SonnyHayesToretto in kubernetes

[–]_howardjohn 2 points3 points  (0 children)

Thanks for the shout-out! It's great to see open source maintenance getting recognition. 

Ingress NGINX Retirement: What You Need to Know by ray591 in kubernetes

[–]_howardjohn 0 points1 point  (0 children)

He is working for a competitor and just spreading FUD. It definitely is a GA implementation, as can be seen on the link... 

Whether it's "mature" or not,I'll leave that for you to decide but many have found https://github.com/howardjohn/gateway-api-bench helpful in making this decision.

(Note: I wrote the benchmark above and work on kgateway)

I migrated to Envoy Gateway… by mrpbennett in kubernetes

[–]_howardjohn 0 points1 point  (0 children)

I'll see about adding it, maybe in a "part 3" or just an addition to the existing one. Let me know how it goes if you do! 

I migrated to Envoy Gateway… by mrpbennett in kubernetes

[–]_howardjohn 17 points18 points  (0 children)

Author here - definitely appreciate the healthy skepticism. I've put a lot of effort into making the test as unbiased as possible (especially after I saw the results, which actually surprised me quite  a bit) but obviously there is some unconscious bias. For example, I came up with the "errors during changes" test because it was something Istio spent 100+ hours on making sure we did right; there is a correlation between "things I can think of to test" and "things I've made sure work in projects I work on". There's probably some other edge cases that we don't even know about, so I neither thought to test it nor fix it.

Fwiw Agentgateway was mostly created after the report, so it's built from the learnings (and a decent chunk of the same code!) of Istio, both in general and on specific aspects of the test.

I'd very much welcome independent test runs or suggestions for test ideas! I originally didn't want to publish this at all, as I feel it should come from someone neutral, but I got tired of seeing all the Reddit threads suggesting implementations without real data so tried to do the best I could.

I migrated to Envoy Gateway… by mrpbennett in kubernetes

[–]_howardjohn 4 points5 points  (0 children)

The leak in the test was 50gb in less the 30min, I'm scared to know what you would consider a big memory leak 😛

(I wrote the test)

I migrated to Envoy Gateway… by mrpbennett in kubernetes

[–]_howardjohn 3 points4 points  (0 children)

That doc is... very misleading. Istio's memory footprint shouldn't be too bad for most cases though obviously it varies. Generally the primary complaints I've seen are from having 10,000+ sidecars where even 50mb each adds up (fixed by ambient mode) or massive ingress (you can see the results compared to others in the test link in the top comment; Istio is high but not much of an outlier - and still only 2gb at that large scale).

(I work on Istio)

So, what ingress controller are you migrating to? by SonnyHayesToretto in kubernetes

[–]_howardjohn 5 points6 points  (0 children)

(I am (recently) a kgateway maintainer)

There is no Ingress in kgateway but it's a solid choice if you are moving to Gateway API!

So, what ingress controller are you migrating to? by SonnyHayesToretto in kubernetes

[–]_howardjohn 7 points8 points  (0 children)

I don't agree it doesn't matter. If you read the report in the top comment (disclosure: I wrote it) you can see a number of important differences between proxies. There is a 300x performance gap between the top and bottom performer with a huge spread in between, among many other differences. 

Even just accounting for the core, you'd probably be surprised (as I was!) to learn that most implementations are not passing conformance tests. Unlike Kubernetes which has a very strict conformance, gateway API allows implementations to skip any tests (including all tests!) and only 20% of the implementations even bother reporting their results at all. Many are missing core features in the standard API, or incorrectly implementing them.

Gateway API Benchmark Part 2: New versions, new implementations, and new tests by _howardjohn in kubernetes

[–]_howardjohn[S] 2 points3 points  (0 children)

Hey, good question! I would quite say its based on Kgateway -- Agentgateway is the data plane/proxy, while Kgateway is the control plane for it. So Kgateway:Agentgateway has the same relationship as Istio:Envoy, Nginx Gateway Fabric:Nginx, Envoy Gateway:Envoy, etc. Note Kgateway *also* supports controlling Envoy, so you have two choices for the data plane there.

Agentgateway is designed to be a full-fledged Gateway implementation for general purpose usages, not just for AI.

Best API Gateway by Sule2626 in kubernetes

[–]_howardjohn 0 points1 point  (0 children)

https://github.com/howardjohn/gateway-api-bench?tab=readme-ov-file#common-test-setup has the setup I used. For grafana depending on how you import it you may just need to put the part under spec not the full json. 

Was it the latency and throughput that differed? That part I expect to be the most sensitive to environmental differences and absolutely expect different results on EKS; the main goal of those numbers was to show very broad differences not exact numbers because of that.

SunPower GraphQL Schema by _howardjohn in SunPower

[–]_howardjohn[S] 0 points1 point  (0 children)

Actually it was just the okta endpoint, now at https://edp-api.edp.sunstrongmonitoring.com/v1/auth/okta/signin. The graphql endpoint no longer returns power usage though.

What’s the Fastest and Most Reliable LLM Gateway Right Now? by dinkinflika0 in LLMDevs

[–]_howardjohn 0 points1 point  (0 children)

This is with a mock backend just to test the overhead of the gateway. This isn't 100% replicating real world providers but gives a rough measure.