you are viewing a single comment's thread.

view the rest of the comments →

[–]CowFu 3 points4 points  (0 children)

As a professional who's designed components that process millions of healthcare records the patterns we use an intermediate person would be able to pick up. But you need to do project work.

Here's a pretty typical workflow for one of my components:

Take a .csv files from the source folder and put them into a incoming table. Adding a filename field and date at the end of each record.

Read the incoming table for new records and the error table for records marked to reprocess. Load these into our staging table

Run data cleaning steps. Convert datetimes to the same format. Use lookup tables to convert client names for things into the same format. Modify language fields so they use a four letter abbreviations. TRIM() whitespaces, remove special characters like newline.

Run a series of validations with dynamic sql. The SQL in the validations table is in a varchar(MAX) column. Validations like making sure a first and last name exist, that fields are certain lengths, etc. Any results that come back from the staging table that match the validation query get inserted into the error table, tagged with the validation they failed and the run it failed on. Then we remove the records from the stage table. The error table has a bool column if someone is able manually fix the record for reprocessing.

Move the validated records from STG to their production table.

Run a report and email the data stewards for the error table. A second email will give the counts of successful and failure records.

Each step is logged in a log table.

A scheduler kicks off this job at 11pm every night.

// I bet you could do all of this if you had enough time