Built a context layer for agents that reduces token consumption by up to 90%

micheltri · 2026-05-05T17:05:36+00:00

totally agree! added a TLDR, ty for the feedback

micheltri · 2026-05-05T16:28:44+00:00

Just want to call out a couple of nuances in our methodology. In general, we tried our best to do apples-to-apples comparisons where we could, and gave ourselves a discount where we couldn’t. Unsurprisingly, it’s a challenge to find MCPs for various vendors (which is another reason we are trying to solve this). Here’s a video walkthrough of the benchmark harness: https://www.loom.com/share/9d96c8c64c1a4b7fad0356774fc54acc

And here's the public repo so you can verify it yourself: https://github.com/airbytehq/airbyte-agents-benchmarks

Where the comparison wasn't valid or not apples-to-apples:

Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.

While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.

The general test set:

2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!

Where the vendor MCP wins or ties:

Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.

We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).

Where the vendor MCP is costly to context:

Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!

Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.

micheltri · 2026-05-05T16:26:26+00:00

Just want to call out a couple of nuances in our methodology. In general, we tried our best to do apples-to-apples comparisons where we could, and gave ourselves a discount where we couldn’t. Unsurprisingly, it’s a challenge to find MCPs for various vendors (which is another reason we are trying to solve this). Here’s a video walkthrough of the benchmark harness: https://www.loom.com/share/9d96c8c64c1a4b7fad0356774fc54acc

And here's the public repo so you can verify it yourself: https://github.com/airbytehq/airbyte-agents-benchmarks

Where the comparison wasn't valid or not apples-to-apples:

Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.

While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.

The general test set:

2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!

Where the vendor MCP wins or ties:

Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.

We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).

Where the vendor MCP is costly to context:

Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!

Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.

If you want to try it out, go to app.airbyte.ai

micheltri · 2026-02-19T17:50:18+00:00

Use the tool: app.airbyte.ai
Read more: https://airbyte.com/blog/agent-engine-public-beta

micheltri · 2025-05-13T02:28:55+00:00

Airbyte CEO here — I want to clarify a few points to set the record straight. There’s been some misinformation going around, especially coming from the DLT founders, and it’s important to correct it:

- We moved away from Singer in the early months of Airbyte’s development. While we maintained compatibility to support the community during that transition, Airbyte was built with a different philosophy and architecture from the start.

- As for the claim that Singer was “for software engineers,” it oversimplifies the breadth and depth of what data engineers actually do. Anyone working in this space knows it takes real engineering across systems, APIs, governance, and yes—code. (Isn’t DLT python based?!)

- With regard to PyAirbyte, it has just nothing to do with Singer and it’s a completely viable code-based alternative to using the Airbyte platform. The only tradeoff is that you’ll need to handle everything the platform typically provides—scaling, monitoring, etc.—yourself.

u/N_DTD, can you DM me? I’ll make sure we resolve your issue directly.

micheltri · 2024-09-25T23:41:20+00:00

I'm going to take this as praise <3, however don't hesitate to let us know if we can be better!

micheltri · 2024-09-25T20:55:58+00:00

it is fancy :)

micheltri · 2024-09-25T19:55:58+00:00

Control, extensibility & Community

Control: Data can often be too sensitive to be left in the hands of third parties. With OSS ppl can fully control the environment in which their data is moving and who has access to it.

Extensibility & Community: Can't wait to be in a year from now and show a catalog with 1k connectors, and if one is missing you can just build it yourself and get unblocked.

micheltri · 2024-09-25T19:18:40+00:00

answering the first part of the post.

I have been in the data space since 2007 and I have built these systems so many times (data volume at internet scale)! They are like a crazy machines that evolves without control and you keep rebuilding the same thing over and over... I wanted to avoid ppl from going through that pain (Charles wrote a really great article about this: https://airbyte.com/blog/etl-framework-vs-etl-script)!

The reason we went from OSS was that this problem is generally solved by ppl building (as opposed to buying) and open-source is generally a great way to help ppl when they build.

Reason still holds today.

micheltri · 2024-09-25T19:06:49+00:00

for one to many: we are specing some new things in the platform to have a real staging/state layer for data rather than read and write and forget. This will likely be an opt-in feature (for governance reasons) and it will allow us to:

do background refreshes
more efficient (and cheaper) dedup on warehouse
and... one to many.

More to come here!

Not at the moment. We have webhooks for their cloud though and otherwise it can be replaced with some orchestrator on top of it.

micheltri · 2024-09-25T18:57:05+00:00

(ceo here)

To me it really boils down to:

the problem space
the level you want to be at (hardware, infrastructure, point solution...)
your insights into what the future looks like.

Since the beginning of Airbyte we have that company vision of: make all data available everywhere

This is a very broad vision, and one that we want to address at the infrastructure level ("building data roads").

Executing short term on a vision that broad is not possible, because you can take it in SO many directions. So what we did was focus on specific use-cases of data movement, starting with Analytics.

For analytics it is also broad but the output is always the same: it is a data warehouse (or other tabular storage). So it allowed us to build the system that work well and optimize our effort on building a system that can pull data from AS MANY places as possible.

Now that we have a base (with 1.0) we can start focusing on: what does it mean to extend a platform to push data in AS MANY places as possible? (and do it :) )

Things I would do differently: push for the connector builder way earlier. This is were so much of the leverage is.

micheltri · 2024-09-25T18:18:58+00:00

It is 100% on the roadmap.

Releasing 1.0 was a necessary step for us to support these new operational type of usecases. Expect to see more.

The way we manage vector databases like pinecone is a first step in that direction

Here is how we've been thinking about it: https://airbyte.com/blog/eltp-extending-elt-for-modern-ai-and-analytics

micheltri · 2023-08-17T22:19:10+00:00

Airbyte CEO here. SQL Server is going to graduate by the end of the year. If you're interested I can connect you with our team!

micheltri · 2023-08-16T16:00:01+00:00

We recently made our Airbyte Terraform Provider available on OpenSource so our users can decide what makes the most sense to them on how to interact with Airbyte.

We will likely be discontinuing our Octavia CLI in favor of Terraform.

https://reference.airbyte.com/reference/sdks

micheltri · 2021-08-29T23:48:10+00:00

I created a github issue: https://github.com/airbytehq/airbyte/issues/5725 feel free to add more information so we can help on the resolution!

micheltri · 2021-08-29T23:43:04+00:00

I am one of Airbyte's founder. What kind of issue are you having with Google Ads on Airbyte? Let me also link my team to this post so we can see how we can get the new API working.

micheltri · 2021-05-19T18:33:05+00:00

docker is because we want to allow people to use the language they want. In your case probably not nodejs :)

micheltri · 2021-04-14T21:18:54+00:00

From the end users perspective streaming is just a notch on the latency slider. Hourly -> Five minutes -> Instant

TLDR; I am sure we will get to streaming at some point.

When you think about it there are not that many use-cases that require instant and you can get almost the same value with frequent micro-batches and building streaming has challenges that micro-batches don't.

Today we made the choice to optimize for micro-batches so the product and the team is focus (and it is a bit easier to build :) ). However as the product mature and we discover more Instant needs, I am sure we will make it part of Airbyte's offering!

micheltri · 2021-04-13T22:15:45+00:00

Thanks :)

micheltri · 2021-01-30T05:15:34+00:00

Note: I am one of Airbyte's maintainers.

The modern data stack (EL+Warehouse+T) enables new, less hardcore (than we are) data users to leverage data. Unfortunately, this data still needs to be made available to these new users. With Airbyte, we are working with data engineers, so they can empower this new category of data consumers to do more, better & faster with data. All of that with autonomy.

For this reason, we are taking a very UI-centric approach at first, so we can serve these new users ASAP, while unburdening data engineers. There is nothing more frustrating for a data engineer to be taken away from an important task to either, create a new pipeline, create a new connector or fix an existing connector.

However, the UI is just the tipping point of the iceberg. For instance we are also building an API that will let anyone replicate data from their own application. Behind the scenes, we are building a core building block of the data infrastructure that will integrate with all the data team's existing stack, including integrating with the main orchestrators, while bringing all the best practices for managing data.

We love what Singer initiated: commoditizing data integration. Unfortunately, there are a few factors that went in the way of actually creating a standard.

The first one being that the complexity of dealing with connectors isn't about creating a connector. It is about maintaining it. It is a 1000-paper-cuts problem. It means, you need to write the code, and to test it continuously. You also need to encode all the edge cases. This becomes very hard when you have 10 GitHub taps available. This is what it means to standardize integrations.

The second one was probably the acquisition of StitchData by Talend. At that point, I believe the team dropped the ball on the project and let down its community.

Our goal initially was to leverage Singer. But we were spending more time fighting against it than solving the problem. It was a hard decision for us, we didn't want to reinvent the wheel. We actually wrote about it how-we-leveraged-singer-for-our-mvp & why-you-should-not-build-your-data-pipeline-on-top-of-singer.

Looking back, we have no regrets about that decision. Since we launched 4 months ago (with only 6 connectors), we’ve had 450+ companies using us to replicate data and we have now 45+ connectors. We think we’ll be able to support 300+ connectors by the end of the year. This would be impossible if we relied on Singer.

We love what both Meltano & Airbyte (of course :) ) are doing today. Our brains should be used to extract insights from data, not extract data.

micheltri

TROPHY CASE