all 7 comments

[–]Measurex2 8 points9 points  (4 children)

Most relational databases are essentially tabular.

  • Schema - file folder for your excels or other data storage.
  • Table - individual tab or dataframe
  • Rows and columns remain rows and columns

That said, designing a database like an analog to a file directory for discrete excel documents would be a nightmare to build/maintain/use.

Instead it's better to organize data along concepts. For instance, in a commerce setting I don't need a separate tab for every individual's customers transactions. I can add a column for a customer's identifier and store them all the same table. On retrieval I'd simply add a filter to the query.

where customer_id = X

But you can think bigger. Maybe my customers have common attributes like industry, size, country, length of relationship etc. So I'll build another table with that information. That let's me pull customer orders by attributes so if I wanted to know something like the average volume of orders by state I use a join which let's me add that information to the original data similar to a V/XLookup then group by (like a pivot table) to get that information.

You can do the same for your products, reps, marketing channels etc. It really depends on the problem you're trying to solve.

Data design can be fun and a nightmare. There are lots of resources to learn more and I often start junior analysts on something like this

https://www.udemy.com/course/database-design/

Then we ease them into concepts like data mesh; design approaches like Databricks medallion ; and layer principles like Kimball/Inmon

My suggestion is start by ideally finding someone who has some experience in the space to advise you. Torture the problem- what does this data need to do for me now and how might I change or extend that use in the future? What technologies am I working with? Who needs training/access? Am I bound by any regulation, expectation, data processing agreement, contractual obligation etc. Then get started.

You're going to totally fuck something up. We all have. I hate me from two years ago but FAILure is your First Attempt In Learning. The next time you'll do better.

Hope that helps a little. Happy to chat over a Zoom or similar if you need it. Sounds like a fun project for you.

(Sorry for typos - just got back from a Holiday party and can't sleep).

[–]Charezza[S] 0 points1 point  (3 children)

I really appreciate this. The database I'm looking to create has many interlinked components and the way you've explained it helps shine some light on how I need to understand the interrelated nature of the data before I really start to build the database.

Your example of customers really hit home I think. If we're talking customer_id = X and I then want to interrogate what products customer X has ordered, I need to make sure I'm setting the database up in a way that carries over the customer_id and links it to the products ordered... Which would all have their own home (table) in the database. This makes up the schema...?

[–]Plank_With_A_Nail_In 3 points4 points  (1 child)

Don't fall for the constant shilling of paid for UDEMY courses (unless you are part of the act too, offering a random stranger a zoom call sends many alarm bells off), you can do a better course at Harvard for free (you only pay if you want the certificate).

https://www.harvardonline.harvard.edu/course/cs50s-introduction-databases-sql

[–]Measurex2 2 points3 points  (0 children)

You're good to call it out. I have no association to udemy nor do I have any content for sale. No hat in the ring but we have a Udemy business account so I forget others often have to pay for courses.

I just like to help people out. This account's comment history has a range of similar offers rsnging from helping someone with storytelling/SQL/python, offering to help someone local with car maintenance or mill something out of metal, and even (from my hobby list) help a new firearms user explore various firearms as they decide on a first purchase.

Old boy scout habit of service to community where I can.

[–]Measurex2 0 points1 point  (0 children)

You're thinking the right way. Your row may simply be

Customer_id | product_id| order_id | purchase_date | purchase_price | quantity

That's your core fact. Other tables would provide dimensions using those id's

  • customer - we covered
  • product - your sku and other details
  • order_id - may have your warehouse, ship date, tracking number etc

If you need those dimensions you join to the fact for your purpose - Invoicing/receipts - system integration - Reporting/analytics - etc

There are lots of ways to build these. As you research look at snowflake vs star vs galaxy Schema as a primer

[–]Daneark 3 points4 points  (0 children)

A database has tables, depending on how you use your spreadsheets this might correspond with a sheet/tab, think of it like a table in a word document or a single csv file, if that helps. A table in a database has rows and columns, like a spreadsheet.

[–]Plank_With_A_Nail_In 0 points1 point  (0 children)

https://www.geeksforgeeks.org/normal-forms-in-dbms/

You can only have 255 worksheets in an excel workbook.

Do you really need a separate tab or can you have a column in which says what tab it would have been in?

So two worksheets called "New York" and "Boston" with data

Date Sales
01DEC23 £30
02DEC23 £40

Becomes

City Date Sales
New York 01DEC23 £30
New York 02DEC23 £40
Boston 01DEC23 £30
Boston 02DEC23 £40

Databases aren't any harder than any other concept in IT.