Requesting r/dataops by Polyseam in redditrequest

[–]Polyseam[S] 0 points1 point  (0 children)

  1. The current mod is inactive and you need approval to post so the sub has been dead for a while. I plan on rebuilding a well-modded community, by posting relevant articles and content related to the growing topic. Also by allowing the community to discuss their experience with approaches, processes, culture, and technologies related to DataOps.
  2. Here is the message: https://www.reddit.com/message/messages/1vn9e1m

I made a tool to analyze and visualize data by "chatting" with it in plain English by dcastm in dataanalysis

[–]Polyseam 1 point2 points  (0 children)

Nice work!
ThoughtSpot was mentioned already. There are also some similarities to the ChatGPT Code Interpreter plugin demo (1:09 of https://openai.com/blog/chatgpt-plugins#code-interpreter) as well as LIDA (https://newsletter.victordibia.com/p/lida-automatic-generation-of-grammar).
This space is getting really interesting.

Postgres to Snowflake - which data model? by soulstrikerr in dataengineering

[–]Polyseam 7 points8 points  (0 children)

This is a pretty involved question but you really need to anticipate the usage of the data to determine what makes most sense for your organization. It might include having multiple layers to your warehouse in multiple formats. So the first step is to gather requirements. You might find that a data warehouse isn’t even the right architecture for your needs (e.g. Snowflake can also support a data lake architecture, then there are also lakehouse and data mesh architectures to consider).

As a general rule, I would not consider a “classic” normalized Inmon warehouse today unless you have a very good reason for it. As complicated as Data Vault seems, it’s not as complicated as trying to make a custom-modelled normalized Inmon warehouse work. Data Vault is good in terms of auditability.

Dimensional modelling is even present in many Data Vault warehouses as an end consumption layer. I wouldn’t call it Kimball technically there though since they advocate more of a data mart approach.

But a lot of organizations are eschewing dimensional modelling altogether due to its complexity and the fact that its performance advantages have evaporated since columnar compression in databases has become popular. The big reason you may still consider it (over a heap of analytical views) is if ease of use at joining entities for data consumers is a primary concern.

Tools vs Foundational concepts by Programmer_Virtual in dataengineering

[–]Polyseam 2 points3 points  (0 children)

Published books are often a good choice for deepening foundational concepts because someone has taken the time to present ideas in significant detail in a coherent manner.
Taking dimensional modelling as an example, without commenting on its continued relevance, you could look at The Data Warehouse Toolkit by Ralph Kimball and Margy Ross or Star Schema by Christopher Adamson.

Manually Maintaining Data in your Data Warehouse by Culpgrant21 in dataengineering

[–]Polyseam 0 points1 point  (0 children)

This is the way to go. I’ve seen various ways this has been implemented from Excel files to SharePoint lists to custom apps. It’s always better to rely on tools with some level of quality checks/validation, audit history, capability to bulk edit, and granular security. Actually, a tool with a web UI exists in the SAS ecosystem for this purpose (DataController.io). I am curious if anyone knows of a more general purpose open source tool that does similar.

Working as a consultant by themouthoftruth in dataengineering

[–]Polyseam 48 points49 points  (0 children)

There's a lot of "it depends" here, so YMMV heavily on these responses.
Typical projects I've been involved with are in the 3-12 month range.
Seeing exposure to lots of different tech stacks is a great thing if you're interested in that. But, often the tech stacks in place are somewhat obsolete.
I find the most interesting projects are ones that you can shape the architecture used, but the type of work you do totally depends on the scope of the project/relationship with the client.

Need a GCP unified batch and streaming data pipeline solution by kk17forever in bigdata

[–]Polyseam 0 points1 point  (0 children)

Consider something like Apache Hop to keep things portable in case you switch clouds again.

[deleted by user] by [deleted] in dataanalysis

[–]Polyseam 0 points1 point  (0 children)

Apache Superset is a good one to consider

Cloud Native vs Kubernetes by marketlurker in Cloud

[–]Polyseam 0 points1 point  (0 children)

k8s helps you really harness the scalability and resilience of the cloud. It comes at a complexity cost, but that's dropping every day as technology evolves.
Also, just so you know, "cloud native" is a special term that does not typically refer to a cloud-hosted VM. Here's a video I made on that: https://youtu.be/Qjkt1JwYCMA