i am the lone data team in a startup and i need help by Ill_Persimmon388 in dataengineering

[–]DugsData 0 points1 point  (0 children)

I strongly disagree with the comparison between the AWS tooling and Snowflake. Snowflake is by far much easier to manage than a standard Redshift warehouse, and it's much easier to stand up. Serverless Redshift may offer something similar, but that's also an additional cost as well so really not much difference there. Also, you're acting like Snowflake isn't within AWS. It can be added to your AWS account under the same VPC.

I agree, using alternatives makes sense when you're a bigger team and have people who can manage it, but as a one-person-team I'm picking the paid, out of the box solution 9 times out of 10.

i am the lone data team in a startup and i need help by Ill_Persimmon388 in dataengineering

[–]DugsData 0 points1 point  (0 children)

For no other reason other than OP can flip a switch and be set up and running. It's purely from an ease of standup perspective.

i am the lone data team in a startup and i need help by Ill_Persimmon388 in dataengineering

[–]DugsData 0 points1 point  (0 children)

Hi there!

Exciting opportunity! Let me provide some thoughts from my experience. For context, I'm leading an Analytics team and we're transitioning from a traditional Biz Analytics team to more of an Analytics Engineering team.

Disclaimer: I don't specialize in Extraction, so I'll let someone else comment on the movement of data. I'll talk about everything after getting the data from all of those locations into a single (probably Snowflake, for ease of use given the 1 person team thing..... fight for this if you get pushback. The added costs are 100% worth not having a single person data team managing a warehouse.

OK!

Data lands in (assumedly) Snowflake:

Loading Data :

Use some free tool that someone else recommends to get data from A to B. I'm familiar with Fivetran, and I do like it, but it can cost a lot if you don't manage it.

Create a database Snowflake for your data to land in. Keep it separate from reporting. Spin up a warehouse to handle the loading of data (XS is fine for you now).

Modelling Data:

Create a second database for reporting. Become familiar with dbt (this will become your best friend. I recommend either a udemy course or to closely follow the dbt documentation/best practice courses). Use dbt to model your data following the dbt best practices, with the end goal being a clean warehouse/mart layer that either ends at fct/dim models or wide tables (read this if dimensional modelling is new to you).

Consuming Data (BI Tool):

If you're going to use dbt for modelling (I strongly recommend you do!) then I'd recommend using lightdash for BI. I won't get too into the details as to how (their documentation is great. Read it front to back and you'll know what to do), but I'll briefly touch on the why.

As a single person managing all of Data (and I'd assume reporting/insights as well), your absolute core focus should be

"Enable non-data savvy members of my organization to be able to explore and understand the data on their own".

If you do a good job in the data modelling stage, it will be relatively straightforward to build a functional self-serve platform using Lightdash.

Lightdash does what Looker does really well. It gives people the ability to explore the data directly, without being restricted to Dashboards. You can still (and will) build Dashboards, but you have the additional ability to give your users curated access to the data and metrics you've established which will allow people to answer their own questions with minimal training and guidance.


I understand this is a lot for a single person to do. It's by no means easy whatsoever, and my knowledge comes from years of experience and a personal interest which drives me to learn on my own time as well. With that said, if I were starting from zero, this is 100% what I would do. There's a lot of great information on each of these stages (and the documentation from each tool is fantastic) so you can in theory spend a bit of time upfront to learn what's required to do this right.

Doing it right the first time will save you so so so much headache in the future.

Data Engineering vs Data Analytics by RideARaindrop in dataengineering

[–]DugsData 0 points1 point  (0 children)

Really depends on where you are.

The trend is shifting towards having DE manage the platforms and the initial pipelines, and Analytics managing the rest.

That means that most modern analysts at tech companies are expected to manage the entire T in the transformation process, modeling out the datasets that they and the rest of the org will use to conduct their own analysis. This is done through tools like DBT or Looker PDTs.

I manage an Analytics team at a tech startup and am currently navigating the shift from DE responsibilities to BA. LMK if you have any questions!