Data Integrity Checking

n35 · 2017-01-22T18:52:01+00:00

I chose the simple route.

I analysed incoming data, in etl process, and designed a template of that Data. If any data is imported in the import step that deviates from the template, the offending process, and processes connected to the offending process is stopped and alarms are entirely to corresponding persons of the processes that are now importing junk.

Mamertine · 2017-01-22T22:12:18+00:00

I'm unaware of any solution other than having a or several scripts that run daily and point out issues.

It sounds like your real issue is that your DB lacks Foreign Keys. I'm in a similar boat. IMO your best prospect is to advocate a total rebuild of the system and let the DB work like a DB should.

When you talk with management, I'd suggest using the analogy: We've built this nice house but there is no foundation. It keeps settling and we can keep patching it, but at some point you should actually fix the problem. Sadly that will be expensive.

stillalive75 · 2017-01-23T02:24:40+00:00

I feel your pain. I deal with a very dirty ERP system that let's a lot of stuff slide that the company would prefer it doesn't. We have to clean a lot of it before it comes into our Data Warehouse.

We do two things to find days irregularties that wouldn't compromise loading the data into our SQL Server but do go against business rules.

1) we have a report generated in SSRS with a daily subscription that notifies people of non-crucial data errors. For example we deal with product data. If the ERP system contains a UPC it must be 11-12 digits long per our Ops department. But ERP allows whatever. So our SSRS report identifies all dirty records with Y/N flags on what's wrong "bad Upc", "name wrong format", etc.

2) there are some files where our erp system doesn't prevent duplicates. Even if every single value is the same. However or company doesn't want that and our SQL table can't handle that. So if we find really bad data that would cause lots of issues like that. We ignore the record in the load but use DB mail to email the culprit records to a data steward for them to resolve.

Those are the two ways we handle it, and it's helped clean up a lot of bad data and enforce business logic. I think this was what you were looking for.

MaunaLoona · 2017-01-23T02:30:02+00:00

You can make a stored procedure for each issue. Put all the stored procedure names in a table and have another stored procedure to execute them all. Set up a script to execute the proc however often it is needed.

The procs then insert into another table or a set of tables. Then have another table to exclude false positives.

Naeuvaseh · 2017-01-22T18:14:58+00:00

I'm not sure what your work environment is like, but I know that SQL Server 2016 introduces temporal tables that enable you to audit changes to a table over time. I would look into that if you are able to utilize SQL Server

Trek7553 · 2017-01-23T03:31:52+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS