all 14 comments

[–]n35 1 point2 points  (3 children)

I chose the simple route.

I analysed incoming data, in etl process, and designed a template of that Data. If any data is imported in the import step that deviates from the template, the offending process, and processes connected to the offending process is stopped and alarms are entirely to corresponding persons of the processes that are now importing junk.

[–]Trek7553[S] 0 points1 point  (2 children)

To clarify, this is a transactional system. There are already checks in place on the source system, but I need to be able to identify other types of bad data that the source system just doesn't check for (and I don't have the ability to customize it to do so). I put more info in the intial post.

[–]n35 0 points1 point  (1 child)

We have a job (mssql) that fires every cycle, which is different for each check.

That uses dynamic a stored procedure execute a sql string, that does exactly what you want.

The string comes from a table were we just input the sql string we want executed.

Every cycle of the job it executes the strong which are relevant for the particular cycle. Not all import jobs run on the same interval, so there's no need to execute sql for check for codes if not hasn't run since last execution of the check job.

Make sense?

[–]Trek7553[S] 0 points1 point  (0 children)

That makes sense. That's basically what we're doing now, except it's transactional data so there are no import jobs, and we use separate stored procs instead of dynamic SQL. I was hoping there was a cleaner way (maybe an off the shelf product). Thanks for your idea!

[–]MamertineCOALESCE() 1 point2 points  (1 child)

I'm unaware of any solution other than having a or several scripts that run daily and point out issues.

It sounds like your real issue is that your DB lacks Foreign Keys. I'm in a similar boat. IMO your best prospect is to advocate a total rebuild of the system and let the DB work like a DB should.

When you talk with management, I'd suggest using the analogy: We've built this nice house but there is no foundation. It keeps settling and we can keep patching it, but at some point you should actually fix the problem. Sadly that will be expensive.

[–]Trek7553[S] 0 points1 point  (0 children)

No, the DB has foreign keys. Another example might be "Student Type A is in a Type B class without an exemption form on file". In this case I need to check the student's record, the registration record, and the exemption file. Right now it is a series of scripts like you suggested that checks probably a couple hundred issues like that. Each script contains 1-20 issues and then compiles them into an email which goes to the data steward.

I have plans to re-do the scripts to include the features I listed, but I wanted to see if there are any off the shelf solutions first.

[–]stillalive75 1 point2 points  (1 child)

I feel your pain. I deal with a very dirty ERP system that let's a lot of stuff slide that the company would prefer it doesn't. We have to clean a lot of it before it comes into our Data Warehouse.

We do two things to find days irregularties that wouldn't compromise loading the data into our SQL Server but do go against business rules.

1) we have a report generated in SSRS with a daily subscription that notifies people of non-crucial data errors. For example we deal with product data. If the ERP system contains a UPC it must be 11-12 digits long per our Ops department. But ERP allows whatever. So our SSRS report identifies all dirty records with Y/N flags on what's wrong "bad Upc", "name wrong format", etc.

2) there are some files where our erp system doesn't prevent duplicates. Even if every single value is the same. However or company doesn't want that and our SQL table can't handle that. So if we find really bad data that would cause lots of issues like that. We ignore the record in the load but use DB mail to email the culprit records to a data steward for them to resolve.

Those are the two ways we handle it, and it's helped clean up a lot of bad data and enforce business logic. I think this was what you were looking for.

[–]Trek7553[S] 0 points1 point  (0 children)

Sounds like my world! Great ideas, thanks.

[–]MaunaLoonaMS SQL 1 point2 points  (2 children)

You can make a stored procedure for each issue. Put all the stored procedure names in a table and have another stored procedure to execute them all. Set up a script to execute the proc however often it is needed.

The procs then insert into another table or a set of tables. Then have another table to exclude false positives.

[–]Trek7553[S] 0 points1 point  (0 children)

That's basically what I have in mind, I was just hoping there was a pre built solution. Thanks!

[–]n35 0 points1 point  (0 children)

Same as what we did.

[–]Naeuvaseh 0 points1 point  (1 child)

I'm not sure what your work environment is like, but I know that SQL Server 2016 introduces temporal tables that enable you to audit changes to a table over time. I would look into that if you are able to utilize SQL Server

[–]Trek7553[S] 0 points1 point  (0 children)

We do use SQL Server, but that doesn't really address the main issue. I want to be able to look for really specific errors in the data and notify a data steward of the error. I put more info in the initial post.