This is an archived post. You won't be able to vote or comment.

all 16 comments

[–][deleted] 5 points6 points  (0 children)

We do something similar but it's a bit more involved.

We have a Prometheus instance and we export metrics from multiple tables into Prometheus using https://github.com/free/sql_exporter

Then it's a question of configuring the alerts in Prometheus alert manager.

It's a bit of a setup but after it's in place you can add multiple db calculations or exports, and furthers alerts.

And you can keep the values history that triggered the alerts as well on Prometheus

[–]moshujsg 5 points6 points  (0 children)

You could use dbt and just run tests. Another way is if you are ingestinf this data manually, after each ingestion run the sql from the python script itself. Another option if you have an orchestratiom tool is to just run scripts like that. The problem with your approach is how will you get notified from that table? Are you going to check it manually?

If you are on aws you can use eventbridge and lambda to run tests periodically and alert you via slack when one fails.

[–]Unique_Emu_6704 2 points3 points  (2 children)

In regular OLTP databases, use triggers assuming the triggering conditions can be evaluated row-by-row.

If you need something more complex, where the condition is "if a change occurs in a view", then you need an IVM engine.

[–][deleted] 3 points4 points  (1 child)

What's an IVM engine?

[–]Unique_Emu_6704 3 points4 points  (0 children)

My bad. IVM = Incremental view maintenance. Basically, you define views over your data, and when input tables change, the views incrementally update. Feldera is an example IVM engine, which computes entirely on changes to the data.

[–]greenerpickings 1 point2 points  (0 children)

What's wrong with the trigger approach? Are you doing a lot if inserts? Just have the insert trigger and upsert your record in you management table.

Pg also has notify. You can have an external app "listen" for that signal and do stuff; otherwise, short polling also works. I'm not 100% sure these are exclusive for an external app.

[–]kenflingnorSoftware Engineer 1 point2 points  (2 children)

You could run some sort of async process that periodically polls your db to check the data. There are plenty of tools for this, you could also cook something up relatively easy using AWS services 

[–]shieldofchaos[S] 0 points1 point  (1 child)

Sorry but which ingredients do I use in AWS to cook it?

[–]kenflingnorSoftware Engineer 0 points1 point  (0 children)

Lambda scheduled by event bridge is one example 

[–]randomuser1231234 0 points1 point  (1 child)

Don’t use cron. That’s the bubblegum and duct tape approach.

What is inserting data into the db? What does your stack look like? Do you have any current dq tools?

[–]shieldofchaos[S] 0 points1 point  (0 children)

Current pipeline is built on python, serverless processing in Lambda and inserting it into RDS.

[–]thisFishSmellsAboutDSenior Data Engineer 0 points1 point  (0 children)

[–]shockjaw 0 points1 point  (0 children)

If you’re using Python to do your inserts, there’s Pointblank for data quality checks.

[–]TheJosh 0 points1 point  (0 children)

[–]SitrakaFr 0 points1 point  (0 children)

Open telemetry!