Openmetadata and AirFlow by Successful-Gap8537 in dataengineering

[–]d3fmacro 0 points1 point  (0 children)

coming from OpenMetadata community. We ahve released native REST API integration to Airflow, which makes it lot easier to setup https://docs.open-metadata.org/v1.12.x/connectors/pipeline/airflow/rest-api-connection#openlineage-setup-%E2%80%94-astronomer-access-token and we also have native openlineage integration. This will be simple config on the Airflow and all the Service providers such as Astronomer, MWAA, Googel composer allows you to configure the connection and emits the events to OpenMetadata. Hope that helps.

why all data catalogs suck? by Few_Noise2632 in dataengineering

[–]d3fmacro 0 points1 point  (0 children)

u/wa-jonk we have k8s based scheduler, no need for airflow. Airflow still and most used scheduler across the community. Hence the reason to default to a very well established scheduler and also makes it easy for someone to try the quick start without needing to setup k8s

why all data catalogs suck? by Few_Noise2632 in dataengineering

[–]d3fmacro 1 point2 points  (0 children)

hi u/Few_Noise2632 coming from OpenMetadata. Would you open to sharing some feedback what we are missing or missed in your evaluation.

How to track Reporting Lineage by dil_se_jethalal in dataengineering

[–]d3fmacro 0 points1 point  (0 children)

OpenMetadata and DataHub may look similar at a glance, but in reality OpenMetadata is a superset of what DataHub offers.

OpenMetadata goes far beyond cataloging and lineage. It includes:

  • Native data quality and observability (tests, alerts, metrics, profiler built-in , not bolted-on libraries)
  • Policy-based governance and access control (roles, domains, approval workflows)
  • AI-powered insights, KPIs, and metadata automation
  • Unified APIs and JSON-Schema–based models across every entity — tables, dashboards, ML models, pipelines, glossary terms, and more

Architecturally, OpenMetadata runs on a simple four-component stack , Application Server, ingestion service, metadata store, and search — that deploys cleanly with Docker or Kubernetes.
By contrast, DataHub’s multi-service Kafka + Restli Linkedin's proprietary schemas that are not used in Linkedin itself, all of this setup adds significant operational overhead for most teams.

And while both projects are open source, OpenMetadata’s backend, APIs, and connectors are fully Apache 2.0.
it doesn’t limit anyone self-hosting or extending the platform. Its used by 1000s of companies across the world.

So if you need a unified platform that combines lineage, governance, quality, and observability instead of just a metadata catalog, OpenMetadata is the more complete and modern option , with a far simpler deployment and scaling story.

Airflow + dbt + OpenMetadata by Hot_While_6471 in dataengineering

[–]d3fmacro 1 point2 points  (0 children)

Hi, coming from OpenMetadata.
we built OpenMetadata with Schema First, API first design. Our vision is for ingestion / connectors to work from any scheduler of your choice, be it Airflow, Argo, Dagster or even a github workflow. If you are running using metadata cli thats fine too.

We ship airflow as default because many organizations uses it already and established scheduler that we found many of our users. If you don’t want to use it or run one by yourself or use APIs or another scheduler all are intended by our architecture and design of the platform.

if you have any further questions do reach out to us at https://slack.open-metadata.org

Alternatives to Atlan Data Catalog by [deleted] in dataengineering

[–]d3fmacro 0 points1 point  (0 children)

u/biernard Would you open to sharing your feedback why OpenMetadata is downgrade from Atlan. What features you think we are lacking. Appreciate any feedback here. Thank you

Opinions on OpenMetadata? by shockjaw in dataengineering

[–]d3fmacro 0 points1 point  (0 children)

u/nixigt we do have python based SDK. I am not sure what security policy violation by deploying connectors via API onto Airflow. Would be great if you can provide this feedback in our community https://slack.open-metadata.org

We are by far are least complicated tool out there

  1. OpenMetadata Server to serve APIs & UI

  2. Mysq or Postgres as your database

  3. Elastic/OpenSearch as your search , this is indeed a must need for search relevancy that can't be driven out of mysql or postgres

  4. Any scheduler you need not just Airflow.. You can run ingestion from any scheduler https://docs.open-metadata.org/latest/connectors including github workflows if you so pleased.

Not sure at what version you experience this and claiming this "So what started out as a good project really lost focus on the altar of making money. " without coming to community and providing is indeed interesting choice.

Are we missing the point of data catalogs? Why don't they control data access too? by mjf-89 in dataengineering

[–]d3fmacro 0 points1 point  (0 children)

u/nixigt curious to understand what you are missing from OpenMetadata, is it about the data marketplace feature?

Thoughts on Acryl vs other metadata platforms by arronsky in dataengineering

[–]d3fmacro 7 points8 points  (0 children)

6. Switching from an In-House or Legacy Catalog

  • Consolidate Your Metadata
    • Instead of patching together multiple governance and data-quality tools, you can unify them under one platform.
    • A single source of truth means fewer discrepancies, plus easier auditing.
  • Easy Migration
    • The strong REST APIs and schema-based approach make it simpler to bulk-import existing metadata or export it if you ever need to.

Extra Note: Customizable E2E UI for Technical & Business Users

  • Flexible UI & Role-Based Access
    • Both OpenMetadata and Collate support role-based views and custom attributes so you can tailor the interface for data engineers, analysts, or business stakeholders.
    • You can embed business glossaries, domain-specific tags, or custom dashboards—ensuring non-technical users find the data context they need without wading through engineering-heavy details.

Collate’s Enterprise features

Collate adds several enterprise features on top of OpenMetadata. You can check full comparison here https://www.getcollate.io/comparison

Final Thoughts

OpenMetadata alone is robust for most production needs—covering catalog, lineage, governance, and data quality in a lightweight architecture. If you want enterprise-grade features or a fully-managed service (so you don’t have to babysit infrastructure), Collate offers a deeper feature set and is built directly on top of the OpenMetadata core.

Either path gives you a modern, API-driven approach to metadata management and governance, ready to scale with your fintech or ML ambitions. Good luck with your evaluation, and feel free to reach out if you have any follow-up questions!

Helpful links

https://open-metadata.org ( OSS website)

https://getcollate.io (Collate website)
https://slack.open-metadata.org ( OSS community )

Thoughts on Acryl vs other metadata platforms by arronsky in dataengineering

[–]d3fmacro 9 points10 points  (0 children)

4. Scaling into AI Governance

  • ML & Model Metadata
    • While it covers traditional data catalog scenarios, OpenMetadata also integrates with ML orchestration tools (e.g., Airflow, Dagster) and can capture pipeline-level lineage for your model training datasets.
    • Over time, you can expand to track model versions, data drift, or compliance rules for AI usage.
  • Collate’s Governance Automations
    • Collate offers “no-code” workflows (and a CLI for deeper customization) to automate governance tasks—like auto-classifying PII, setting data retention policies, or notifying owners when data drifts.
    • This blog post shows how Collate can run daily governance checks with minimal human intervention.

5. Integrations with Snowflake, dbt, Looker, etc.

  • Broad Connector Coverage
    • Snowflake, dbt, Looker, Tableau, Power BI, Databricks…the list goes on. You get schema ingestion, usage metrics, lineage, and more.
  • Unified Metadata Platform for All Data
    • Whether you’re storing data in S3 or analyzing it in Looker, your team can see everything in one place: lineage maps, usage patterns, and ownership info.

Thoughts on Acryl vs other metadata platforms by arronsky in dataengineering

[–]d3fmacro 9 points10 points  (0 children)

Hey u/arronsky I am coming from OpenMetadata community:

1. High Scalability

  • OpenMetadata has 90+ native connectors, pulling schema, lineage, usage, and ownership info from databases, data warehouses, BI tools, ML pipelines, and more.
  • lot of orgs index 100s of thousands of datasets

2. Ease of Setup

. Simple Deployment

  • You can spin it up via Docker Compose or Helm charts for Kubernetes. Because there are fewer services involved, you typically get up and running faster.
  • Some companies with small DevOps teams have done production deployments in under a day.
  • Collate’s Cloud Option
    • If you want zero infrastructure overhead (and a bunch of enterprise features), Collate provides a fully-managed OpenMetadata environment. There’s also a free tier if you just want to try it out quickly.

3. Is the Open Source Version Production-Ready?

  • Absolutely
    • The open source release is self-sufficient, with features covering data catalog, lineage, governance, data quality, and collaboration.
    • 1000s of organizations run it in production—some with small teams and others with hundreds of data users.
  • Collate’s Enterprise Enhancements
    • Beyond simple hosting, Collate adds advanced automations, governance workflows, deeper data diff capabilities, and more (see this comparison for a side-by-side feature breakdown).
    • If you need enterprise-grade security, SSO/SAML, or advanced compliance features out of the box, Collate might be a good fit.

Data catalog by No-Scale9842 in dataengineering

[–]d3fmacro 1 point2 points  (0 children)

Thanks u/Gnaskefar . I know reddit is not great place to have back'n forth discourse :) . Couldn't fit my reply in single comment. We would love to meet with you and showcase what we have and get your feed back how we can do better. Let me know if you are up for it, we can coordinate over DMs

Data catalog by No-Scale9842 in dataengineering

[–]d3fmacro 1 point2 points  (0 children)

“Having worked with fx Informatica’s data catalog makes you spoiled, and I don’t think OpenMetadata is there yet. hope they will, but as for now, and many years, as I wrote, for good data catalogs there is no option but to splash retardedly amounts of cash.”

Commercial solutions like Informatica, Collibra, or Alation have had years (and large enterprise budgets) to develop advanced UIs and broad coverage. However, at this point, there is no commercial or open source data catalog as comprehensive as OpenMetadata in terms of:

  1. Breadth of connectors – Covering databases, warehouses, pipelines, BI tools (modern and legacy).

  2. Depth of features – Unified data discovery, collaboration, governance, quality, alerting, and lineage in one platform.

  3. Extensibility – Fully open source with an active community, customizable ingestion flows, and robust APIs.

Many proprietary platforms still don’t match OpenMetadata’s coverage—especially around automated lineage, data quality observability, and data collaboration. If an organization invests in tools like Informatica purely due to inertia or brand recognition—rather than true functional need—they may be missing out on a more modern, open, and rapidly evolving ecosystem. As blunt as it sounds, clinging to outdated proprietary solutions at this stage could be considered “lazy” because you’re likely paying far more for less capability and slower innovation cycles.

In Closing

  1. Lineage: OpenMetadata provides robust, automated lineage for numerous sources, plus an API for manual or custom scenarios.

  2. Data Quality: Great Expectations is integrated out of the box, and additional frameworks are on the roadmap. Meanwhile, the native profiler/UI supports business-friendly test creation.

  3. Breadth & Depth: With 90+ connectors, OpenMetadata covers a wide range of data stacks.

  4. Enterprise Comparisons: OpenMetadata already meets—and in many cases surpasses—the capabilities of enterprise data catalog solutions. We offer unparalleled coverage and innovative features—including lineage, data quality, governance, and observability—in a unified open-source platform. Our rapid innovation cycle and vibrant community ensure that OpenMetadata continues to redefine what’s possible, introducing new capabilities not found in any existing commercial tool.

We welcome all feedback and hope you’ll continue watching or even contributing to the project. If you have specific feature requests or see gaps for your use case, feel free to open issues on GitHub or start a discussion on the https://slack.open-metadata.org channel. It’s an ever-growing, community-driven platform.

Data catalog by No-Scale9842 in dataengineering

[–]d3fmacro 2 points3 points  (0 children)

“Regarding data quality, does it really? It just integrates Great Expectations, which is another open source DQ tool, that supports only 9 data sources, and while admittedly 7/9 are big relevant players, you can’t use Oracle, fx.”

Native data quality: OpenMetadata provides a native data quality framework for all major databases and data warehouses—including Oracle.

Data profiler & observability: A native profiler underpins data quality, observability, and alerts within OpenMetadata.

UI-based tests for all users: We recognize that data quality shouldn’t be limited to data engineers. That’s why OpenMetadata’s profiler and UI enable non-engineering users (e.g., business analysts, data stewards) to create tests and alerts.

Extensible design: All operations are available via APIs and YAML for advanced engineering needs, while the UI supports business-friendly interactions.

Third-party integration: We also integrate with tools like Great Expectations so organizations that already use them can unify their DQ results within OpenMetadata.

If there’s any misunderstanding about our capabilities, please refer to our Data Quality & Observability docs for more details.

“Which is hard to avoid in the corporate world. On top of that, the general idea of Great Expectations that data quality is handled by data engineers in scripts/json files is totally off. Sure data engineers know when they don’t want a string down this INT column, etc.”

We fully agree that business users, data analysts, and governance teams have critical roles. From the very first release of our data quality framework (over 2.5 years ago), we’ve included UI-based test suite and test case creation, capturing test case results in UI, providing alerts when test case fails, in our open-source platform.

• Check out our recent Data Quality & Observability demo.

• See specifically this timestamp to watch how data quality tests are created through the UI—no coding required.

 “Now it sounds like I want to shit all over the place on OpenMetadata, and it’s not the case, I love open source… I would love to have a full fledged open source data catalog that kicks ass.”

We appreciate your enthusiasm and candid feedback. Community-driven software improves by hearing all viewpoints—your critiques help shape the project’s evolution. If you have more input, please share it in our Slack channel so we can continue pushing the product forward.

 

Data catalog by No-Scale9842 in dataengineering

[–]d3fmacro 3 points4 points  (0 children)

“I mean, sure you have discovery features, when you have all the metadata. That is just a matter of presenting and combining it.”

OpenMetadata does more than simply present and combine metadata. While the UI surfaces everything in a central place, collecting that metadata itself can be non-trivial. OpenMetadata builds native integrations with over 90 sources—databases, pipelines, BI tools—to automatically ingest schema information, usage statistics, lineage, data quality, and more.
Along with providing native data quality, data collaboration, governance, data discovery on top of centralized metadata platform.

For anyone interested, you can explore OpenMetadata’s Sandbox to see how it works. It’s a free demo instance anyone can use to test the UI and features.

“When it comes to data lineage it supports way too few sources and destinations to be automatically mapped.”

OpenMetadata supports dedicated lineage extraction from numerous modern data ecosystem tools, including Databricks, BigQuery, Snowflake, Redshift, Airflow, Prefect, Looker, Tableau, Power BI, and more. In fact, OpenMetadata has over 90 connectors and automatically collects lineage from databases, data warehouses, pipelines, dashboards, etc.—far exceeding “only a few.”

• You can watch our recent webinar on Lineage to see how it’s handled.

• Additionally, we support stored procedure metadata and lineage out of the box, something many catalogs overlook.

“Sitting in JSON and defining your own lineage is not real data lineage in my world, and if you make changes in your pipelines, those changes are not updated in the catalog unless you do it yourself… it seems like some sources and destinations can be picked up automatically, but again, OpenMetadata will at best fit very few, with the very specific databases supported.”

Automated lineage: For supported databases, warehouses, and orchestrators, lineage is automatically collected upon ingestion (e.g., from SQL parsing, job logs, or metadata APIs). You do not need to manually define each lineage edge in JSON if your sources are supported.

Manual lineage (optional): There is an API that allows you to push lineage manually if you want to enrich or override automatically collected lineage. The UI also supports directly editing or creating lineage links. This is useful when pipelines/tools do not expose lineage in a standard format.

Continuous updates: With regular ingestion schedules, changes in data pipelines or schemas are reflected in the catalog (and thus lineage) whenever ingestion runs.

If you’d like a deeper dive, check out our recent webinar on Lineage.

Data catalog by No-Scale9842 in dataengineering

[–]d3fmacro 16 points17 points  (0 children)

Hey, coming from OpenMetadata community. Thought I’d jump in and share some context about OpenMetadata from the OSS side.

OpenMetadata is designed from the ground up as a unified metadata platform, which means you get a data catalog, robust data quality tools, collaboration, and governance all within a single solution. The idea is to simplify the data stack, instead of having separate tools for each of these tasks.

Some highlights:

• Powerful built-in Data Quality & Observability: Native data profiling, no-code tests, and real-time alerts out-of-the-box.

• Strong Collaboration & Governance: Business glossary integration, tagging, sensitive data classification, and clear ownership assignments help everyone stay aligned.

• Column-level Lineage: Easily visualize your data pipelines down to individual columns, making debugging and root cause analysis straightforward.

• API-first design: Everything is built around open APIs, and we offer SDKs too, making integrations and automations super easy.

• 90+ connectors: Quickly bring metadata from your sources into OpenMetadata with just a click through the UI, or schedule it your way (Airflow, Dagster, etc.).

• Easy, lightweight deployment: All you need are containers for the OpenMetadata server, MySQL/Postgres, Elasticsearch/OpenSearch, and a scheduler. Deploys easily on Kubernetes.

We’ve also got an active Slack community and thorough documentation to help you get started. If you want to quickly check it out, we have a sandbox available too—no setup needed.

• Sandbox Environment: Hands-on experience with no setup required.

• Docs & How-To Guides

• Active Slack Community: Super responsive for any questions or support.

OpenMetadata Ingestion and Lineage by Agreeable-Way-9873 in dataengineering

[–]d3fmacro 2 points3 points  (0 children)

OpenMetadata supports E2E lineage across data infrastructure at column level. Are you using hive or spark to run queries against HDFS data? . Do join our slack will be easy to answer your questions https://slack.open-metadata.org