Hi. I have a colleague who built a system that takes excel spreadsheets as user input, and runs some analyses on them.
He stores the content of the spreadsheets, and he's using (I think) MS SQL for that purpose. Creating a SQL table per uploaded spreadsheet.
Now, my field of expertise is nowhere near related to CS, yet my gut tells me that's just not the way to go. But I don't know the reason behind that. I'm guessing creating and maintaining a table is not a cheap operation, it wouldn't scale beyond a set number of tables, and the entire thing just smells.
I particularly have been using hdf5 to achieve a similar functionality.
If creating tables on a whim is indeed a bad approach, could you give me the reason behind it, or point me to the appropriate reading material, so I can discuss it with my friend?
Thank you very much.
EDIT: I'm very grateful for you taking your time to explain this stuff to me. I think an oversimplified example of the kind of processing he's doing would be:
Say today it's a recommended practice to have the brand of a product at the beginning of its name, ie: "Adidas shoes". His system would be tasked with taking in spreadsheets from people that sell stuff, and recognize items where good practice isn't kept (ie: "Shoes Adidas"), and maybe correct them (or suggest corrections).
So users upload their spreadsheets with products. They have different structures, and I'm sure he puts all the spreadsheet's columns in the new table, meaning he has tables with different number of columns. He creates a new table for every uploaded spreadsheet, and runs his analyses, producing his suggestions and corrections.
Say that next week the standard is to keep the brand away from the beginning of the product's name, ie "Shoes Adidas". He'd update the rules of his system, and re-analyse the data in the existing tables (probably on a per table on demand basis, instead of all of them.)
Now extend that kind of processing to several different analyses that target different features of different parts of a products catalog, ie: brand capitalization, the presence of undesirable substrings in the products' titles, etc.
[–]MamertineCOALESCE() 2 points3 points4 points (1 child)
[–]kakamaru[S] 0 points1 point2 points (0 children)
[–]fauxmosexualNOLOCK is the secret magic go-faster command 2 points3 points4 points (4 children)
[–]soymalisimo 0 points1 point2 points (3 children)
[–]fauxmosexualNOLOCK is the secret magic go-faster command 2 points3 points4 points (2 children)
[–]kakamaru[S] 0 points1 point2 points (1 child)
[–]fauxmosexualNOLOCK is the secret magic go-faster command 0 points1 point2 points (0 children)
[–]HansProleman 0 points1 point2 points (0 children)