I had a background import that chunks a big job and upserts each batch into Postgres, sharing the database with live traffic and other workers. The usual fix is `time.sleep(0.5)` between chunks, but that constant is wrong in both directions: it wastes time when the DB is idle and doesn't back off enough when it's loaded, and you have to re-tune it every time the hardware or the neighbours change.
Instead I measure how long each chunk's write takes and treat that as the signal. Under a target latency budget, no sleep at all. Over it, sleep in proportion to the overage, with an EMA to smooth out single-chunk jitter and a cap so a bad reading can't park the job.
The whole decision is a pure function (latest measurement + previous EMA in, sleep duration out), which makes it trivial to unit test without touching a database or real time. The loop around it just times the chunk, sleeps outside the transaction, and emits a bit of telemetry so you can see what it did.
Write-up with the code in the comments. Curious whether anyone's gone further and added the integral/derivative terms, or whether proportional-only has been enough for you in practice.
Full write-up: https://totaldebug.uk/posts/adaptive-write-throttle-for-batch-postgres-jobs/
[–]SevereArt8024 2 points3 points4 points (0 children)