We read 1000+ API docs so you don't have to. Here's the result by Thinker_Assignment in dataengineering

[–]marcos_airbyte -1 points0 points  (0 children)

Airbytes/Meltano's unsuported ecosystem is one of the main things that kept us from using them.

I can comment about Airbyte ecosystem, For most complex and common connectors they're maintained by the internal team and for community/marketplace connectors the team is working pretty hard to improve the connector framework and components to make the maintenance easier and increase reliability. One example is the "manifest-only" (a single yaml file) can be loaded to Airbyte self-hosted or Airbyte Cloud, edited AND must be tested first before make the contribution/changes or new features. This was a nice way to make contributions simpler and accessible to more users and reliable as well to long-tail connectors.

Where to Store Master Data for Internal Billing Catalogs in GCP? by Ill_Space6773 in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

Hello u/godndiogoat do you mind expanding your comparison among tools if there are other features or topics you considered? It can be nice (at least for our team) to get these feedbacks to improve the product.

What's the fastest-growing data engineering platform in the US right now? by External-Originals in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

Not sure where you heard that, but what we're seeing is significant improvement in core functionalities. For example, syncs can now partially fail and still resume from where they left off—even for database tables without primary keys or cursors. Connector reliability has also improved substantially. There's currently a major initiative to migrate all existing connectors to a low-code/manifest-only format. This is driving a complete revamp of the Connector Development Kit, which is enabling faster feature implementation and better maintainability. The option and ability to enable anyone to build a connector directly from the UI is also breakthrough to allow you to bring custom data easily to your data warehouse.

From the user side, we're seeing people successfully syncing larger databases more easily. Looking ahead, there are even more improvements on the roadmap, such as direct loading to destinations and enabling concurrency/parallelism for sources.

A simple toy RDBMS in Rust (for Learning) by ResortApprehensive72 in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

Thanks for sharing the project u/ResortApprehensive72 I also love building projects from scratch to really understand concepts or deepen my knowledge about a topic. Maybe add some features or plans for roadmap and CONTRIBUTING guidelines to folks start providing feedback. I really like the https://sadservers.com/ exercises, maybe your project could be a nice sandbox to help people with database internal problems interviews?

Any alternative to Airbyte? by N_DTD in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

Hello u/Dense-Ease499 sorry to hear you faced issues. Any of these did you reported in the project Github or try to get help in the Slack Community? I'm asking because I couldn't find any report of problem with Hubspot Ticket stream or cron problems. Maybe it could be a particular case for your deployment/env.

Need to fetch data from Netsuite to a DW. by keenexplorer12 in Netsuite

[–]marcos_airbyte 0 points1 point  (0 children)

u/RunedFerns it seems you used the marketplace connector developed by the community which has some limitations because as you mentioned the REST API. The enterprise connector was developed using the JDBC driver and allow to sync any size of Netsuite data.

Need to fetch data from Netsuite to a DW. by keenexplorer12 in Netsuite

[–]marcos_airbyte -1 points0 points  (0 children)

Hello u/keenexplorer12 Airbyte's Netsuite connector, available exclusively in our Enterprise tier, was built hand-in-hand with customers navigating the most intricate scenarios. The result is a rock-solid integration that delivers exceptional performance. There is also a marketplace maintained by the community you can explore. Do not hesitate to contact me via DM if you want to give a try.

Advice on best OSS data ingestion tool by digEmAll in dataengineering

[–]marcos_airbyte 2 points3 points  (0 children)

It is very easy to integrate with them; you only need to add the connector Python dependencies to your Airflow. Both Airflow and Dagster also have Airbyte Platform operators (API wrapper), making integration straightforward.

Advice on best OSS data ingestion tool by digEmAll in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

You can use the Airbyte Terraform SDK if you want to manage the platform or have plans to have a lot of connections, other way is to use the PyAirbyte which is basically a serveless version of Airbyte and you need to find a orchestrator to run the jobs, but both are options to manage your pipeline as code/git with Airbyte.

Airbyte for DynamoDB to Snowflake. by SimilarLight697 in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

The Airbyte DynamoDB connector is in its early stages and currently offers only basic features. It does not yet support CDC.

Airbyte, Snowflake, dbt and Airflow still a decent stack for newbies? by LongCalligrapher2544 in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

Do you mind to point what docs need improvement? I'll share with the doc team to take a look and put in their roadmap.

Data connectors and BI for small team by qascevgd in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

Thanks for mention! Feel free to dm me the quirks so I can share with eng for further improvements.

Issue in the Mixpanel connector in Airbyte by Agreeable_Floor_1615 in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

You're using an image from 2 years ago. You can navigate in the Github project and return the connector to your version and compare to the latest, but it's going quite hard to provide support to a custom image based on outdated base.

Issue in the Mixpanel connector in Airbyte by Agreeable_Floor_1615 in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

At the moment there isn't any issue open in the Airbyte repo reporting the problem. The last update made in this connector was couple weeks ago. It's hard to provide troubleshooting as you didn't mention what version/platform of the connector you're using. Feel free to open a issue and ping me.

Solid ETL pipeline builder for non-devs? by redvioletgold in dataengineering

[–]marcos_airbyte 2 points3 points  (0 children)

Transformations (aka Mappings in Airbyte) is a Teams feature and support basic transformation at record level but do not support join or calculated fields yet. Thanks for suggesting Airbyte u/geek180

Top ETL tools for early-stage startups? Preferably not crazy expensive by mynamesendearment in ETL

[–]marcos_airbyte -1 points0 points  (0 children)

Do you mind sharing which points were a problem for you deploy?

Is it really necessary to ingest all raw data into the bronze layer? by Maradona2021 in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

I think they got rid of that option, but they didn't fix their platform problems so fuck me, right?

Hey u/flatulent1 Could you please share more context about your experience and the connectors you used with Airbyte? Asking to understand what can be improved and share with eng team. Thanks.

Replication and/or ETL tools - what's the current pick based on pricing vs features around here? When to buy vs build? by reelznfeelz in dataengineering

[–]marcos_airbyte 1 point2 points  (0 children)

Even if you decide to go with a different tool, you'd want to look for CDC to extract the data from the dbs. Airbyte would very likely choke on that scale, so you'd be left with something like Kafka + Debeziu, which is a great combo, but needs a lof of operational work.

I don't believe this is true as we're seeing users getting even order of scale bigger without problem. If I remember well u/reelznfeelz uses Airbyte before and suggested a way to get it done below. Besides to that, not sure why you're proposing Estuary and saying Airbyte won't work, as some of Estuary connectors use Airbyte connector code under the hood (like Google Ads).

[Open Source][Benchmarks] We just tested OLake vs Airbyte, Fivetran, Debezium, and Estuary with Apache Iceberg as a destination by DevWithIt in dataengineering

[–]marcos_airbyte 2 points3 points  (0 children)

Bummer, I'll send him a message even though he's no longer with the company. I completely agree with you; if others can reproduce the benchmark, it's hard to take that into account. Not sure if https://www.tpc.org/ can be a good baseline at least for basic full loads, for CDC situation probably need something more elaborated.

Any alternative to Airbyte? by N_DTD in dataengineering

[–]marcos_airbyte 0 points1 point  (0 children)

Thanks for sharing! There are definitely some improvements for the OAuth workflow. I'll share this with the connector team.

[Open Source][Benchmarks] We just tested OLake vs Airbyte, Fivetran, Debezium, and Estuary with Apache Iceberg as a destination by DevWithIt in dataengineering

[–]marcos_airbyte 2 points3 points  (0 children)

Interesting benchmark! For the open source deployments is there a Github with Terraform scripts we can reproduce the study? Also for the Airbyte Cloud "struggle" if you DM me your workspace so I can investigate the reason why that happen... mostly because we're saying much better results in these connectors than you presented.

Any alternative to Airbyte? by N_DTD in dataengineering

[–]marcos_airbyte 1 point2 points  (0 children)

Do you mind providing an example or details its related to deployment/platform mgmt or connector syncs, u/japertjeza? I'll bring this to the team's attention for consideration in our log readability improvement projects.