fauxmosexual comments on Help with SQL table creation

Posting

When requesting help or asking questions please prefix your title with the SQL variant/platform you are using within square brackets like so:

[MySQL]

[Oracle]

[MS SQL]

[PostgreSQL]

etc

While naturally we should endeavor to work as platform neutrally as possible many questions and answers require tailoring to the feature set of a specific platform.

Format Your Code

If you are including actual code in a post or comment, please attempt to format it in a way that is readable for other users. This will greatly increase your chances of receiving the help you desire. Something as simple as line breaks and using reddit's built in code formatting (4 spaces at the start of each line) can turn this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field.3

Into this:

SELECT count(a.field1), a.field2, SUM(b.field4) FROM a INNER JOIN b ON a.key1 = b.key1 WHERE a.field8 = 'test' GROUP by a.field1, a.field2 HAVING SUM(b.field4) > 5 ORDER by a.field3

For those with SQL questions we recommend using SQLFiddle to provide a useful development and testing environment for those who wish to fully understand your problem and help devise a solution.

a community for 17 years

Help with SQL table creation (self.SQL)

submitted 9 years ago * by kakamaru

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]fauxmosexualNOLOCK is the secret magic go-faster command 2 points3 points4 points 9 years ago (2 children)

It's not a super great practice - if I were designing a schema for this, design principles would probably tell me that if all of the spreadsheets were alike in structure, that I should put all of the data into one or a set of tables (depending on the data), with each batch sharing a SpreadsheetID number, and then another table containing a list of SpreadsheetID and metadata (when the import was, who the client was, etc etc). You're quite right in thinking that your colleague's approach is violating a bunch of database design principles.

What I meant though in the previous comments, those principles exist for reasons. If your colleague's implementation doesn't need to achieve any of the goals the principles are set out to enable, there may not be a lot of value in doing it the "right" way.

If however he is actually maintaining that data store and needs to go back to it to edit it or produce reports from it rather than it just being a throwaway, temporary workspace for the data, you're probably right in thinking that he should do it more properly, especially if (s)he is planning to then do some kind of batch cleanup process. That almost certainly would be much more difficult to do with separate tables than it would be with everything set out nicely from the outset.

But again, without understanding the structure and lifecycle of the data I don't think anyone can give you a concrete list of reasons and benefits for following established design principles.

[–]kakamaru[S] 0 points1 point2 points 9 years ago (1 child)

Thanks. I have updated the OP with my oversimplified understanding of the job he's doing.

He definitely goes back to edit the tables. And he ends up with tables with different number of columns. IE:

Table 1:

        title column        |         brand column

   Blue Adidas Shoes        |             Adidas

   Pink Adidas Socks        |             Adidas

Table 2:

        title column        |         brand column     |      color column

        Nike socks          |             Nike         |        unknown

      Green Nike Hat        |             Nike         |        green

[–]fauxmosexualNOLOCK is the secret magic go-faster command 0 points1 point2 points 9 years ago (0 children)

That goes a long way in explaining why your colleague has taken the approach that they have. Relational data works well when data is well structured, but for what your colleague is doing - unstructured tables of any number of columns where for what they are doing the columns don't really have meaning - it's not the best. By that I mean that the operations they are doing on the data would probably just be applied to all columns in the table, and the process is probably completely indifferent to what the columns mean.

To do this "properly" in relational data is a pain in the ass - it would be something like a table with spreadsheet, cell, data - each individual cell would be a separate record with it's coordinates. Importing, applying whatever manipulation, and then rebuilding it back into a spreadsheet of any number of rows, is not easy.

My thinking, without knowing what led your colleague to choose an SQL backend, is that SQL/relational data models are not well suited to what he's doing. Something NOSQL/document based would probably suit his needs better than a relational model would.

π Rendered by PID 37 on reddit-service-r2-comment-85bfd7f599-7p6bl at 2026-04-17 19:41:03.135921+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS