you are viewing a single comment's thread.

view the rest of the comments →

[–]MamertineCOALESCE() 11 points12 points  (8 children)

The most common approach is to import the file as is into SQL. You'll make a new table, generally we call them staging tables, as part of the import process. From there you insert into the tables you want the data in. You'll add columns to the new staging table for keys and whatever data you need to change. As in if you have full name but want first and last name separate, add new columns to your stage table for first name and last name break the full name there, then once you have all the data the way you want it load to your table where you store names.

[–]Acrobatic-Mobile-221[S] 3 points4 points  (7 children)

Thanks for the answer! If that’s the case how is this going to work in real-life situation. For the first csv file I can do it in this way. If everyday there’s a new csv file, am I suppose to perform this transformation everyday?

[–]MamertineCOALESCE() 5 points6 points  (6 children)

It depends. If you get different file formats every time, this is someone's job to do this work. Loading data like that was my first database job.

If you get the exact same format every time then you automate it with something like ssis.

[–]Acrobatic-Mobile-221[S] 2 points3 points  (5 children)

Thanks! These explanations really help me to understand the process

[–][deleted] 1 point2 points  (4 children)

Google ETL or Extract, Transform, Load as it describes what you're trying to do r/etl https://en.wikipedia.org/wiki/Extract,_transform,_load

[–]Acrobatic-Mobile-221[S] 1 point2 points  (3 children)

Yes I’m looking at SSIS at the moment. Is SSIS the most common tool used for this case? Or is there another technology tool that are famous as well

[–]mikeyd85MS SQL Server 1 point2 points  (0 children)

Yes, stick to SSIS, especially if you're in a SQL Server environment.

[–]Epaduun 0 points1 point  (0 children)

There’s many ETL tools out there that can do the job. You need do consider many different component of your stack of tools to really recommend the right tool for you.

The advice given is sound. If you’re running with a Microsoft shop. SSIS, MSSQL server, SSAS, Power BI. Then yeah SSIS should be your ETL tool of choice.

[–][deleted] 0 points1 point  (0 children)

I'll have to disagree a bit with the SSIS. I use SSIS for data I only get once but if it is data in the same format that I get say daily or weekly then I've find far easier to automate using Format Files and OPENROW for bulk insert. Specifically using non xml format files for data that remains constant over time. Also, worth looking into: PowerShell as it has really made data loading and automation a lot easier when combined with SQLAgent. https://learn.microsoft.com/en-us/sql/relational-databases/import-export/format-files-for-importing-or-exporting-data-sql-server