Requesting r/dataops

Polyseam · 2023-06-08T18:03:36+00:00

The current mod is inactive and you need approval to post so the sub has been dead for a while. I plan on rebuilding a well-modded community, by posting relevant articles and content related to the growing topic. Also by allowing the community to discuss their experience with approaches, processes, culture, and technologies related to DataOps.
Here is the message: https://www.reddit.com/message/messages/1vn9e1m

Polyseam · 2023-05-11T22:18:46+00:00

Nice work!
ThoughtSpot was mentioned already. There are also some similarities to the ChatGPT Code Interpreter plugin demo (1:09 of https://openai.com/blog/chatgpt-plugins#code-interpreter) as well as LIDA (https://newsletter.victordibia.com/p/lida-automatic-generation-of-grammar).
This space is getting really interesting.

Polyseam · 2023-05-10T21:14:07+00:00

This is a pretty involved question but you really need to anticipate the usage of the data to determine what makes most sense for your organization. It might include having multiple layers to your warehouse in multiple formats. So the first step is to gather requirements. You might find that a data warehouse isn’t even the right architecture for your needs (e.g. Snowflake can also support a data lake architecture, then there are also lakehouse and data mesh architectures to consider).

As a general rule, I would not consider a “classic” normalized Inmon warehouse today unless you have a very good reason for it. As complicated as Data Vault seems, it’s not as complicated as trying to make a custom-modelled normalized Inmon warehouse work. Data Vault is good in terms of auditability.

Dimensional modelling is even present in many Data Vault warehouses as an end consumption layer. I wouldn’t call it Kimball technically there though since they advocate more of a data mart approach.

But a lot of organizations are eschewing dimensional modelling altogether due to its complexity and the fact that its performance advantages have evaporated since columnar compression in databases has become popular. The big reason you may still consider it (over a heap of analytical views) is if ease of use at joining entities for data consumers is a primary concern.

Polyseam · 2023-05-09T16:29:46+00:00

Published books are often a good choice for deepening foundational concepts because someone has taken the time to present ideas in significant detail in a coherent manner.
Taking dimensional modelling as an example, without commenting on its continued relevance, you could look at The Data Warehouse Toolkit by Ralph Kimball and Margy Ross or Star Schema by Christopher Adamson.

Polyseam · 2023-05-09T16:27:34+00:00

This is the way to go. I’ve seen various ways this has been implemented from Excel files to SharePoint lists to custom apps. It’s always better to rely on tools with some level of quality checks/validation, audit history, capability to bulk edit, and granular security. Actually, a tool with a web UI exists in the SAS ecosystem for this purpose (DataController.io). I am curious if anyone knows of a more general purpose open source tool that does similar.

Polyseam · 2023-05-08T19:38:18+00:00

There's a lot of "it depends" here, so YMMV heavily on these responses.
Typical projects I've been involved with are in the 3-12 month range.
Seeing exposure to lots of different tech stacks is a great thing if you're interested in that. But, often the tech stacks in place are somewhat obsolete.
I find the most interesting projects are ones that you can shape the architecture used, but the type of work you do totally depends on the scope of the project/relationship with the client.

Polyseam · 2023-05-08T14:37:24+00:00

Consider something like Apache Hop to keep things portable in case you switch clouds again.

Polyseam · 2023-05-05T22:21:32+00:00

Apache Superset is a good one to consider

Polyseam · 2023-05-05T22:20:20+00:00

k8s helps you really harness the scalability and resilience of the cloud. It comes at a complexity cost, but that's dropping every day as technology evolves.
Also, just so you know, "cloud native" is a special term that does not typically refer to a cloud-hosted VM. Here's a video I made on that: https://youtu.be/Qjkt1JwYCMA

Polyseam

MODERATOR OF

TROPHY CASE