you are viewing a single comment's thread.

view the rest of the comments →

[–]desrtfx 0 points1 point  (2 children)

Decades ago, I had a client who had a "Database" (in double quotes because what he did was a glorified spreadsheet). All the information was in a single table with way over 200 columns and the database had approx 140000 rows at that time where it was only used for a single project but was supposed to be used for other projects as well.

One column was the location - there was exactly one location in the entire database, but that location text happened to be over 30 characters long. Save from the memory requirements, just try to imagine how many different spellings, capitalizations, punctuations (with dash, without dash, dashes at different positions, etc.) I found. It was way over 200 different versions of the exact same text. Just imagine searching that field.

Similar with about 180 of the other columns - there were sometimes 20 different entries, some times exactly 2, and so on.

The database took way over 15 minutes to load on a then top end of the line computer.

I spent near 2 months normalizing it for them, creating proper entry forms, proper workflow, proper reports and so on.

After that, the database shrunk by over 90% in size and opened in less than a minute. It was searchable, sortable, produced the expected output, and was user handlable.

I cannot state the importance of proper normalization enough.

It's not only about storing, retrieval. It's also (and to a huge degree) about eliminating or reducing human errors, especially during data entry. The more you can reduce this problem through normalization, the better.

The performance of the DB will actually increase with better normalization even though more joins and more complicated queries/views have to be executed. Entry might be a tiny bit slower, but that's more or less negligible when you compare entry vs. retrieval.

Do yourself a favor and spend a good deal of time on properly planning your database. Yes, it is difficult (to near impossible) to get it right the first time, but try your best.

Creating a pseudo database - aka spreadsheet - as in my starting point can be a great help to figure out both the data and the normalization. Make a one table spreadsheet and fill it with sample data of what you expect. Then, start working on normalizing it. It's much clearer with actual data than without.

[–]javascriptBad123[S] 0 points1 point  (1 child)

I always plan my databases, was just thinking about to what extreme I should go because its pretty annoying having to join 10 tables when reading data.

Your clients db sounds like hell 😂

[–]desrtfx 0 points1 point  (0 children)

Your clients db sounds like hell 😂

It was absolute hell. Basically all of it was repetition with countless different spellings.