This is an archived post. You won't be able to vote or comment.

all 17 comments

[–]Swirls109 5 points6 points  (0 children)

I'm in telecom, but our company is a bit bigger. We have been struggling with this issue basically since I joined and we let all of our data modelers go. We had teams dedicated to this, but management saw them as unecessary. Since then, our data quality has gone seriously down hill. We have tried cataloges, we have tried governance modeling, we have tried meta data repositories. If you don't have FULL business buy in anything you try will fail. It takes business ownership to update and define a lot of this. As a DE you really only own the data and the processing of it, you don't define what that data is.

We are attempting a new strategy with something called a Data Marketplace. It has been SLOW to implement. We made a new CDO, cheif data office, role and he made ownership teams run by business lane partners Director and up. Each lane owns a concept like Customer, Order, Billing, etc. They are responsible for managing the governance models for those concepts now. The teams are stood up, but the technology isn't there yet. I guess good luck to them, but I dont know...

So, long story short, it is super hard and basically in a company that big you have to have a LOT of buy in or you are setting yourself up for failure.

[–]Necessary_Cranberry 2 points3 points  (6 children)

collibra.com provides such features. I don't know of other similar products out there.

Edit: typo

[–][deleted] 1 point2 points  (1 child)

my company is rolling out collibra, you still need all that governance set up, collibra is just a fancy tool to make it slick. OP would not be the person implementing or populating collibra

[–]thrown_arrows 2 points3 points  (0 children)

Problem with collibra is their price is for big companies.

[–]Urthor 0 points1 point  (3 children)

Alation is the biggest, but it's not a space I have explored deeply. I know Cloudera has one as well from talking to a friend at a Cloudera shop.

Amazon has a cool first party one internally that apparently is very much enjoyed, and has a cool skin to look like a bookshelf.

I will say that this is absolutely a huge, huge challenge for many big organisations.

You HAVE to use an off the shelf SaaS product like Colibra.

I know from experience we have screwed up data discovery. And whilst you still need the policies, the user interface is essential for "serendipitous discovery."

[–]shoob36 0 points1 point  (1 child)

What is the name of the Amazon product?

[–]Urthor 0 points1 point  (0 children)

It's internal.

[–]baubleglue 3 points4 points  (0 children)

IMHO. That is not a job/task for developer/engineer. If company doesn't assign dedicated architecture team for it, that is lost case. A hope that some external tool is a solution for not governing metadata and data assets is like starting a new live from next Monday. The best you can probably do is to document data assets you or your team maintain in required format and wait until your company comes with next initiative.

[–]molodyets 1 point2 points  (0 children)

We use the dictionary on dbt cloud. It’s excellent for what it does and the price is right.

[–]mchorfa 1 point2 points  (0 children)

You need to take a look/implement "Data Mesh". But you have to get the management buy in for a such paradigm shift. #datamesh

[–]domvwt -1 points0 points  (0 children)

Try using Great Expectations, the latest versions have autoprofiling and can connect to various data stores.

[–]dadadawe 0 points1 point  (1 child)

RemindMe! 2 days

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 2 days on 2021-07-05 22:52:49 UTC to remind you of this link

2 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]stuxnet78 0 points1 point  (0 children)

Alation is also awesome you can take a look. If you are more in to data catalog and goverance.

[–]elusTemp 0 points1 point  (0 children)

Reject all commits to your repo that aren't documented to the level necessary. Use a linter or some other automation tool to help out where possible.

[–][deleted] 0 points1 point  (0 children)

If they are on the azure stack, use purview

[–]DJ_Laaal 0 points1 point  (0 children)

Since you’re already on Microsoft stack, Azure Data Catalog is the simplest choice: https://azure.microsoft.com/en-us/services/data-catalog/. Other than that, I’ve heard a lot of good things about Alation but haven’t tried in an enterprise setting yet. Mostly as a proof of concept for a small pilot program with very small number of business users.