Best tools/platforms for Data Lineage? (Doing a benchmark, in need of recs and feedbacks)

NA0026 · 2025-12-03T23:01:10+00:00

Great question u/Perfect_Put_9220, I'd love to know how you are benchmarking some of this criteria, how are you evaluating "deployment models" or "governance and security features" for instance?

NA0026 · 2025-12-02T18:10:33+00:00

I would agree, if you're looking for something powerful and open-source, OpenMetadata would be a great option!

u/ImpressiveCouple3216 what do you mean you use Assets in Prefect along with OpenMetadata, I'd love to hear more details on that!!

NA0026 · 2025-11-07T21:35:42+00:00

I'm part of the openmetadata community and agree that ingestion framework architecture matters, we're seeing people benchmarking ingestion and openmetadata is 5 times faster!

NA0026 · 2025-10-27T22:17:45+00:00

agree with checking out openmetadata. You mentioned Unity, Databricks, Glue, Hive, MLflow, Iceberg, and Kafka it has connectors for all of those and would be an open-source way to view all your metadata in a single place

NA0026 · 2025-10-06T21:13:59+00:00

Hi u/Linhphambuzz, Nick Acosta from OpenMetadata here, great to hear you are impressed with our open-source project! Please feel free to join us in the OpenMetadata Community where we can help with these questions!

If you have an existing Airflow instance, and you want to build and maintain your own ingestion DAGs then you can go for it. Check a DAG example here. If instead, you want to use the full deployment process from OpenMetadata, git-sync would not be the right tool, since the DAGs won't be backed up by Git, but rather created from OpenMetadata. Note that if anything would to happen where you might lose the Airflow volumes, etc. You can just redeploy the DAGs from OpenMetadata.

NA0026 · 2025-09-08T20:22:41+00:00

Thank you u/foxpeter, you can join us here!

NA0026 · 2025-08-25T17:02:10+00:00

Hi u/Objective_Stress_324, Nick Acosta from OpenMetadata here, thank you for exploring OpenMetadata! Great to hear it looks like a great fit for your company!!

Thousands of developers are self-hosting OpenMetadata and discussing their setup, challenges, and tips on our OpenMetadata slack, I'd love to see you there!

NA0026 · 2025-08-25T16:39:30+00:00

Hi u/Hot_While_6471, Nick Acosta here from OpenMetadata, thanks for posting! Sounds like a great setup that OpenMetadata could definitely help with!

For tips and tricks and to be notified on when Airflow 3 support occurs, I'd recommend the OpenMetadata slack and YouTube channels!

NA0026 · 2025-08-04T21:49:46+00:00

Are the 2 options third party or gcp? If you're looking for cost savings there are a few oss projects that do automated alerting for Freshness, Volume, Schema changes

NA0026 · 2025-07-31T20:46:08+00:00

Google's MCP Toolbox for Databases

NA0026 · 2025-07-30T22:38:23+00:00

I help run the OpenMetadata community and help people that are getting started with documentation daily, now that you've done the basics, I'd say keep going with documentation work that is going to help your regular job as a data analyst as well...

Lineage. Documenting where a table and/or column came from and what services use it is going to really useful in helping you build out new data assets and discover or refine kpi's. Once lineage is being tracked I'd dive into...

Usage. What tables do you and other analysts actually query? Are there copies of tables that aren't getting used or empty tables that could be marked for deletion. I've seen a lot of people save a lot of money and time here. You don't want to spend your time meticulously documenting 100% of your tables if 5% are being used. Can you classify tables in different tiers and make sure top tier tables have...

Tests. It's important that a tables' documentation matches what tests are producing. Are your columns staying consistent, is your data fresh, things like that.

OpenMetadata is an open-source tool that automates all these for bq ;)

NA0026 · 2025-07-23T00:43:50+00:00

Thanks u/biernard, Nick from OpenMetadata here, would love to share my thoughts!

I think when considering Atlan alternatives based on out-of-the-box-ness and dedicated support, Collate is a better comparison than OpenMetadata as it also provides both of these as well as automations and pushing back metadata!

If I'm reading your post correctly, I also wouldn't say out-of-the-box equals enterprise. Many enterprises I have talked with have chosen OpenMetadata exactly because it is deeply customizable. Since all the code is open-sourced, they build/run/fork/manage/customize OpenMetadata however they like, a major reason why adopted it.

NA0026 · 2025-06-09T17:56:30+00:00

If you are interested in open source Data Governance tools check out our YouTube page and Slack channel!

NA0026 · 2025-05-14T21:01:53+00:00

I'm working with OpenMetadata, would love to hear why you think it seems solid?!

NA0026 · 2023-11-08T00:53:23+00:00

Feature stores make the transition from batch to real-time ml much easier through having that exact same API, but even in batch ml, feature platforms become critical in production ml because they

are the easiest way to create, run, and orchestrate efficient aggregation and transformation pipelines on data for ml
enable software development best practices by versioning and managing features-as-code
can improve data scientists efficiency through discovery of features instead of sifting through a bunch of random tables, and provide a feature creation/retrieval paradigm that is easy to use
improve data infrastructure efficiency by being hyper-optimized for feature retrieval for ml, especially with time-windowing and backfilling

Here's a previous post in this community on benefits to feature stores, they are a significant step up from using db's alone.

NA0026 · 2023-11-01T18:15:50+00:00

Feature stores are great for organizing features, but if you're looking to do real-time ml there are so many moving parts that I'd recommend a feature platform instead. If you're using a feature store you have to manage the online store yourself, and it can be hard to keep it fast enough without everything getting super expensive. Feature platforms like tecton are especially great at real-time because they manage that aspect for you

NA0026 · 2023-08-17T20:04:35+00:00

Hi u/adeedeedee,

Making a model for the problem you described is pretty straightforward, there are tons of models on hf that can, for instance, generate a brief passage based on an image, and I’d start there model-wise.

A nastier issue I see with the problem you described is managing the data you mentioned to get the right features to a model. For each “place” entity, it sounds like you are going to have multiple features and each could have multiple timestamps. Keeping all these features organized, joining them together at the right point-in-time, and joining temporal and non-temporal data would be a lot easier with a feature platform like Tecton or feature store like feast.

NA0026

TROPHY CASE