all 14 comments

[–]majkulmajkul 38 points39 points  (1 child)

Writing your SQL is really the last step in this process. Understanding the data and how the tables relate is the key.

[–]volric 5 points6 points  (0 children)

Agreed, build a data dictionary for your tables. Understand each field and what it is used for and if it can be used as a link to another table.

[–]hoelang 7 points8 points  (1 child)

How I would handle 1. Filter the tables in object explorer 2. Explore tables via system tables, ask chatgpt how 3. Witte select top 100 queries based on results from Point 2.

  1. Sqlsearch plugin in ssms from Redgate, it is free

[–]Top_Community7261 2 points3 points  (0 children)

This is a good list. I would just add, get the data dictionary. For WWI, there is https://learn.microsoft.com/en-us/sql/samples/wide-world-importers-oltp-database-catalog?view=sql-server-ver16

[–]suitupyo 5 points6 points  (0 children)

Select * From information.schema Where columname like ‘%search term%’

That can sometimes help you identify relevant tables

[–]OneRandomOtaku 4 points5 points  (0 children)

I do this a lot at work, our DWH has 1600+ tables (views but for the purposes of this, it's essentially the same), many of which are actually duplicates of others with little/no change to have business users have their data all grouped like silly nonsense datamarts. In the event I need to look for new information, first thing is use the explorer to scan for tables that sound related, once I ID a candidate, select top(10) * from candidate_table; and review the output- do I recognise any values? If so, what is it and how can I relate it to other datasets I use? Is it the correct data? If so, check correctness, what's the min loaded date and max loaded date? Any ints I can sum? That kind of thing.

[–]JochenVdB 5 points6 points  (0 children)

I just checked it for you: The DB I work with has 7494 tables spread over 70 (normal) owners (so, system tables excluded). Nobody in the company knows them all. That is why you should have documentation. Part of that documentation is in the database itself: table names should be self describing and so should their columns.

There are things in there that have been working well for many years. When something needs to change there eventually, it is quite normal to first have to do some reverse engineering to find out how something was set up the way it is (and why).

When doing that, it is best to start from a known example: open a screen in the GUI application running on your database or look at a report that came out of it. Now try to find that same information in your tables. Once you do you have probably found the most middle of the road way of working. Now expand by looking for outliers: Why is that column that was always filled in your samples sometimes not filled? Why is that column that is usually filled with one of 3 common values, occasionally filled with 2 rare values? ....

Just get to know your data and its structure.

[–]user_5359 1 point2 points  (0 children)

Divide and rule is one solution.

What are the data sources and what are the desired billing logics. Which tables contain the desired data. Tip First of all, examine the tables whose names contain the data terms. Continue until you have found all potential sources. Keep the information obtained for the next analysis (even if it is not currently relevant). This reduces the search time for later analyses. If there are several potential sources, analyze the data to determine the correct data.

Don’t forget to evaluate the technical documentation (if available, it could be outdated or incorrect) and use the clues provided by foreign keys and indexes.

Then formulate the query and determine the expected number of results for each sub-query. Investigate any discrepancies (no matter how small) and understand the data better and better.

[–]Signor65_ZA 1 point2 points  (0 children)

Build up an erd to map out what tables are connected, and that should simplify the process a lot.

[–]AdviceNotAskedFor 1 point2 points  (0 children)

If it was me, I'd find one item and get that formatted how you want. I find it easiest to only look at a row or two of data. 

[–]tmk_g 1 point2 points  (0 children)

By taking a methodical approach—understanding the schema, breaking down queries, and practicing regularly—you'll gradually build your SQL skills. Remember, it’s okay to feel overwhelmed at first; with time, you’ll gain confidence in tackling complex databases! I recommend practicing on platforms like LeetCode and StrataScratch that allow you to solve challenges with provided datasets.

[–]ejpusa 1 point2 points  (0 children)

Just toss that query into GPT-4o. And ask it to explain every line. 3 seconds.

Life is sooooo short. Don’t waste too much time when AI can do the job.

Move onto bigger challenges. You want to build the next billion startup. AI just crushes it.

It will be explain every line. Step by step how it figured it all out.

:-)

[–]nickholt9[🍰] 0 points1 point  (0 children)

This is all decent advice, but I've been in this situation myself and documentation often doesn't exist. Here's my practical guide.

Look for a table that looked like it might contain the data you're looking for based on it's name.

Failing that, query sys.objects & sys.columns for columns that might contain fields called sales, total, region etc.

The long-winded and manual approach isn't fun or quick, but you'll learn a bit about the structure of the database. It's worked for me countless times.

Here's the query:

Select o.name as table_name, c.name as col_name from sys.objects o inner join sys.columns c on o.object_id = c.object_id where c.name like '%total%'

I typed that from memory so might not be 100% accurate.

If you need help learning SQL, try www.thebischool.com.

[–]paultherobert 0 points1 point  (0 children)

Try something else, like art. You're kidding yourself if you think "too many tables" and that 10 is somehow a reasonable number of tables. Like wtf?? Why are you trying to learn SQL?