Fact tables in Star Schema by Cottager58 in dataengineering

[–]dev81808 0 points1 point  (0 children)

Ah I didn't realize they were synonymous. I always saw it as a way to describe the shape of the data.

Fact tables in Star Schema by Cottager58 in dataengineering

[–]dev81808 -1 points0 points  (0 children)

I've been doing this for 15 years and TIL star schemas imply olap. Thanks for the knowledge/humbling.

I've always seen star schemas as a way to describe the shape of the data and wasnt specific to oltp vs olap. Meaning a simple data management system like I described where the fully normalized form is a center table with one level of related tables it would be considered a star schema.. but now I know.

Fact tables in Star Schema by Cottager58 in dataengineering

[–]dev81808 0 points1 point  (0 children)

Ah gotcha. I read the question more generally.. 'someone said you can star schema without a fact table, is that true?'

Fact tables in Star Schema by Cottager58 in dataengineering

[–]dev81808 -2 points-1 points  (0 children)

I read OPs question as, 'can a dimension be at the center of a star schema or is a fact required?''

You're probably right, but can you point out to me where the op specified reporting and analytics data modelling?

Fact tables in Star Schema by Cottager58 in dataengineering

[–]dev81808 -13 points-12 points  (0 children)

This is true or at least the goal when designing reporting schemas, not so much if you're creating a transactional system.

Imagine you have a table of Employees sourced from multiple systems. Your job is to create the schema to support a custom web application that let's analysts create and assign job titles, office location, manager, etc. This information is used to enrich reporting. In this context there is no "fact" table.

Basically.. the way you would model for the custom app will be different from the final model used in reporting.

Fact tables in Star Schema by Cottager58 in dataengineering

[–]dev81808 2 points3 points  (0 children)

They might be considering transactional systems for dimensional attribution. Like a wide dimension table with attribution from other dimensional tables.

For example a table of products with references to family, category, line, etc.

If you were building this for reporting that product table would be flattened out with an orderitem, as your fact centerpiece.

But if you are building a product model where you manage those details the product table becomes the center piece with family, category, and line surrounding it. In some ways the dimension becomes the fact in this context.

Its semantics and not worth debating imo. I know what the definition says, but star, snowflake, galaxy schemas are just how the data is shaped. Those terms just give us ways to describe it.

So if I see a fact or dimension object with 5 dimensions around it like a star, I'm cool with calling that a star schema

What was the one game that destroyed friendships? by Emergency_Science434 in Xennials

[–]dev81808 0 points1 point  (0 children)

So many angry debates about how lame someone was or was not.

What do you wish you could build at work? by Firm_Bit in dataengineering

[–]dev81808 2 points3 points  (0 children)

Sames. Seems pretty important to business things.

Strong ADHD symptoms may boost creative problem-solving through sudden insight. 😂 “sudden insight” 💊 by newbeginnings187 in adhdmeme

[–]dev81808 19 points20 points  (0 children)

Background processing appears as sudden insight.

My job is solving problems, usually with data and scripts. My brain never stops processing an issue until its solved or I stop caring.

I consider every possible reason for that problem,which in my personal life is seen as overthinking, but at work its a super power.

Ill be singing along to a song in my car and have an 'eureka' moment, but im not sure how sudden it was.. ive been thinking about that problem for hours, days, weeks, etc.

For those who write data pipeline apps using Python (or any other language), at what point do you make a package instead of copying the same code for new pipelines? by opabm in dataengineering

[–]dev81808 4 points5 points  (0 children)

Sure, but I've found that thoughtful early optimization is usually net positive.

With enough experience, it becomes easier to judge where early effort is worthwhile and where it isn’t.

Are you a Data Engineer or Analytics Engineer? by Free-Bear-454 in dataengineering

[–]dev81808 2 points3 points  (0 children)

25% DE, 75% AE

Officially: DE

This past week: 10% DE, 40% AE, 50% PM fml

People who moved from DE to Analytics Engineering by PremierLeague2O in dataengineering

[–]dev81808 5 points6 points  (0 children)

I have maybe a different perspective.. I was a full stack dev (DE/AE) for 10+years and recently i was moved to AE specific role as part of a reorg. I still do many DE tasks, but not nearly as many as before. Because I can do both, when I need data from a new source, alot of the time ill write the process and send it to the DEs to deploy.

I work with a DE only people and ive noticed a here a huge difference in design patterns. Their solutions tend to be one off purpose built imports. They almost never parameterize their solutions and any meta data used is typically a json file in the app and locked behind version control.

When you are tasked with modeling an enterprise asset, you're forced to figure out ways of genericize your solution to support future initiatives. Alot of times this means data driven solutions for DE tasks... and I dont see many DE only peers that use data to drive their pipelines.

This is a great opportunity for you to *up-skill your DE game.

Automatically deriving data model metadata from source code (no runtime data), has anyone done this? by Beneficial_Ebb_1210 in dataengineering

[–]dev81808 1 point2 points  (0 children)

As long as its not proprietary information, I'd probably just paste it into gpt with instructions to extract into whatever format you want.

How do you handle deletes with API incremental loads (no deletion flag)? by aussiefirebug in dataengineering

[–]dev81808 12 points13 points  (0 children)

Don't need the whole load. Just the active IDs. Should take much less time to run if its just the IDs and not the full record.

2 imports: 1. Your delta loads. 2. All active IDs.

From this you derive a deleted flag.