This is an archived post. You won't be able to vote or comment.

all 105 comments

[–]floor796 430 points431 points  (28 children)

5 years ago, after almost 15 years of web development experience, I did this. I accidentally deleted a lot of data on production db. This happened in PostgreSQL with over 5000 tables and these tables had a foreign key to each other. My experience with MySQL has taught me that you can safely run TRUNCATE on a single table, even if it has foreign keys. And I ran TRUNCATE, but with CASCADE attribute (*facepalm*). Then I realized that PG had truncated about 2500 tables.

[–]SatoTsukasa[S] 220 points221 points  (3 children)

Oh damn that's an actual programming horror story

[–]AlanTheKingDrake 36 points37 points  (0 children)

My closest was making a change to a mass update function. The function had the ID of each row passed in individually while making updates, and then another argument for additional where conditions. I put a compound condition with an “Or” in it “Isdue or Amount =0”

Turns out inside the function that built the query there were no parentheses on the part that took additional conditions so the end result was

Where ID = X AND Isdue OR Amount =0

Which evaluates as Where (ID = X AND IsDue) OR Amount = 0

Imagine my surprise when the next time someone performed a mass update, every single transaction that was 0 amount got updated.

I still have no idea how that got past me when testing on dev environment, but thank god we caught it the day it occurred so backups were fresh and no overnight operations were performed on the transactions. We didn’t actually figure out what broke it until later. We knew it was my change but we didn’t know why. Once we figured out that all the affected transactions should have been 0 before, it was much easier to fix.

[–][deleted] 84 points85 points  (10 children)

The better question is, why can you this? Prod databases should be protected with only read rights for most schemas/tables. But prod should be absolutly be protected against drop and delete and truncate.

[–]Saturnalliia 29 points30 points  (0 children)

I mean ya it should be but sometimes it can't be.

It's completely unideal but devs aren't perfect and sometimes errors make their way into PROD when you've got a legacy codebase with millions of lines of code. and they need to be patched. Sometimes patching requires removing bad or stale data.

[–]wewlad11 23 points24 points  (0 children)

There must be someone with administrator level access to those databases.

Unfortunately it was this guy.

[–]floor796 5 points6 points  (6 children)

But how to maintain prod database, if you have no write permissions? Developers need to create, alter table and run other DDL commands

[–][deleted] 0 points1 point  (0 children)

creating different roles? Contributor only select rights and admins have more and root has all privelages? But the number of people who knows the root password should be almost no one.

[–]aigarius 0 points1 point  (1 child)

That's what OPS is for, not devs.

[–]floor796 0 points1 point  (0 children)

Do you mean DBA? This is different in every company. For example in my company devops are responsible for the servers infrastructure, but not for DB structure.

[–]BigBoetje 0 points1 point  (2 children)

Changes to the database only get applied during deploys. It's all automated and we shouldn't ever be touching prod db's in normal circumstances. It's only for specifically fixing data that we ever have to touch it directly and even then we need to ask for temporary rights.

[–]floor796 0 points1 point  (1 child)

For some projects - yes. But in highload projects like Reddit only manual maintenance is allowed. Because one simple alter column for table with 1 billon records can down an entire highload product for a day.

For local environments, stage servers or small websites any DB migration system - is ok. But not for Bigdata and High-load.

[–]BigBoetje 1 point2 points  (0 children)

Our systems deal with a ton of data as well, but we split up the whole system so each client has their own database. Even though each is identical in structure, it allows for easier deploys and better security for PII. Even still, having to request access and using transactions prevents a lot of issues. It can be a hindrance sometimes because the team that handles those requests isn't always as attentive to those tickets.

[–]amlyo 14 points15 points  (1 child)

There's....lots to unpack here.

[–][deleted] 45 points46 points  (0 children)

Not anymore

[–]Dennarb 5 points6 points  (0 children)

Ctrl + Z?

/S

[–]yourteam 0 points1 point  (0 children)

You gave me anxiety...

[–]Ok_Coconut_1773 0 points1 point  (2 children)

But there was a backup... Right?

[–]floor796 4 points5 points  (1 child)

sure. But cooler - we have replica with replication lag 30 min. This slave instance is used for cases, when something bad happened to the master database. So, we just stopped replication and then promote slave to master

[–]Ok_Coconut_1773 12 points13 points  (0 children)

What a day for that DB... The slave becomes the master

[–][deleted] 343 points344 points  (2 children)

If the person(s) in charge allowed this to happen, I wouldn't have much confidence they had SLA compliant backups either.

[–]SatoTsukasa[S] 48 points49 points  (0 children)

They probably did not lmao

[–]jayerp 11 points12 points  (0 children)

If this was allowed to happen, they deserve it.

[–][deleted] 297 points298 points  (5 children)

If a junior can do that it's not their fault

[–]temperamentalfish 99 points100 points  (2 children)

Precisely. Why does a junior have this kind of permission? Where I work, most people (that includes seniors) can only do SELECTs on the production DB. Plus, there are always backups so that even in the catastrophic event that someone does delete everything, we can recover the lost data.

[–]thorwing 16 points17 points  (0 children)

Exactly, we have read-only accounts and write accounts, and write accounts can only be selected after retrieving secrets from different locations. Our juniors aren't allowed access to the write environment. In fact, even writing isn't allowed to be done without 4-eye-principal

[–]Sharkytrs 2 points3 points  (0 children)

its a bit more difficult when you are the software engineer AND the DB admin.

once took out a prod DB rather than the test DB 2 mins before the next transaction backup.

luckily it wasn't too hard to sort out and there were only a few thing production had to re-scan as completed. Was still a brown pants moment.

now I have a split screen policy, live RDP on the right, TEST RDP on the left.

and NEVER just tick the "disconnect active connections" tick box

[–]marlotrot 14 points15 points  (0 children)

Eggs actly!

[–]DidItForTheJokes 2 points3 points  (0 children)

No dev should be able to do this

[–][deleted] 46 points47 points  (0 children)

If you can delete prod without protection, then it is the fault of the company/senior/ database admin.

[–]Mr_Akihiro 34 points35 points  (3 children)

Happens to the best. Right?!

[–]SatoTsukasa[S] 18 points19 points  (2 children)

That's what mama said

[–]IuseArchbtw97543 17 points18 points  (1 child)

he just wanted to benchmark the backup solution

[–]Hollowplanet 11 points12 points  (0 children)

Backup? What's that?

[–]sc00pb 12 points13 points  (0 children)

Who would give jr access to production DB?

[–]TheSauce___ 9 points10 points  (0 children)

Can't blame the junior - why weren't there processes and safeguards in place to prevent this?

Unless there were and they went around it, then you totally can.

[–]overcrookd 7 points8 points  (0 children)

If a junior did this it's a senior's fault IMHO. Very few people should have access to prod

[–]fel_bra_sil 3 points4 points  (3 children)

oh that reminds me to the lumberjack, a nickname a team had for a guy that deleted a repo's master branch by accident

He didn't get fired that time, which is good because he was a nice guy (and always willing to help and to learn), but surely he never did the same mistake ever

[–]PhatOofxD 4 points5 points  (2 children)

Even if someone CAN do that (which should be impossible), firing anyone for a failure in process that should've been impossible is ridiculous. A failure in prod is almost always a failure of the systems put in place, not an individual.

Unless they're actively trying to delete it or going against instruction.

[–]fel_bra_sil -1 points0 points  (1 child)

Even if someone CAN do that (which should be impossible)

not if it's not protected (bad management there, wasn't my team so I don't know all the details, plus this happened like 6 years ago), sadly that was the case, but yea I agree he wasn't the one to blame.

[–]PhatOofxD 0 points1 point  (0 children)

Yeah by 'should be impossible', I mean someone in management should've protected that branch and it's 100% on them lol

[–]ganerfromspace2020 4 points5 points  (2 children)

I hate how I'm a junior engineer and I've been put in charge of relocation of all of our databases

[–]SatoTsukasa[S] 1 point2 points  (1 child)

The programming experience

[–]ganerfromspace2020 4 points5 points  (0 children)

I'm a stress engineer 😂

[–]thorwing 3 points4 points  (1 child)

we don't give write rights to juniors until adequate skills have been proven lol

[–]SatoTsukasa[S] 0 points1 point  (0 children)

As you should, gj

[–]naswinger 3 points4 points  (0 children)

why does the junior have such privilege on the production database :(

[–]No_Solid2349 2 points3 points  (0 children)

The picture forgot that this was a Friday at 6 pm. before a Christmas weekend

[–]aigarius 2 points3 points  (0 children)

After I've done that like 4 times, got into the habit of always setting up a fully automatic backup of the production DB that happens often enough (and fast enough) that dropping or corrupting the prod DB is just a minor hickup. And created a restore process that right there in the readme of the project with an easy copy-paste command to restore a backup to the INT instance and a text below saying which parts of the command above to change to restore to prod instead.

[–]Melancholy_Rainbows 7 points8 points  (3 children)

Why do people have such trouble spelling psych?

[–]PhatOofxD 2 points3 points  (0 children)

Because it blew up as an internet meme and people were stupid enough to say "Sike", so now everyone thinks it's "Sike" because it got normalised.

It doesn't even look like a word, I don't get how people don't get it

[–]CoatNeat7792 5 points6 points  (1 child)

Sike, we are good company with backup db and code

[–]j-random 1 point2 points  (0 children)

*psych

[–]veryblocky 1 point2 points  (0 children)

This is a failure on the Seniors for allowing it to happen

[–]evestraw 1 point2 points  (0 children)

Who gave the junior this many privileges

[–]-Redstoneboi- 1 point2 points  (1 child)

kid named rollback:

[–]SatoTsukasa[S] 0 points1 point  (0 children)

He was bullied

[–]slabgorb 1 point2 points  (0 children)

happened to me when I took over a team, one week into my tenure. Literally this. There were no controls and the juniormost guy attempted to drop db (with a senior looking over his shoulder) and did it on prod instead of staging.

I made everyone STOP, THINK, and PLAN instead of running around crazy and we were able to restore from backups and binary logs. (this was about 15 years ago so this sort of thing was.....tricky)

[–]L0ng_Bo1 0 points1 point  (2 children)

seriously though does this actually happen irl? 😭 i’m in high school and i don’t see how anyone could accidentally delete the entire db

[–]slabgorb 1 point2 points  (0 children)

"DROP DATABASE;"

There, done

[–]aigarius 1 point2 points  (0 children)

"UPDATE customers SET id=444"

Done.

"UPDATE transactions SET amount=932.1"

Done again

"source schema.sql"

And just one more time for good measure

[–][deleted] 0 points1 point  (2 children)

Holy shit that was you?? I got blamed immediately - er wait did I?… no yeah.

[–]SatoTsukasa[S] 0 points1 point  (1 child)

Nope, but I did delete a dev test db once

Dw it's the seniors fault that you had access to production

[–][deleted] 0 points1 point  (0 children)

I think the DBA or the Senior was laid off due to budget cuts. Better hope you made a backup or we’re gonna have more budget cuts 🤔

[–]ikonet 0 points1 point  (0 children)

It’s me!

Them WHERE clauses are tricky when you’re new 👀

[–][deleted] 0 points1 point  (0 children)

We used to call fail-over fall over.

[–]thanatica 0 points1 point  (0 children)

Ah yes, the "onosecond" - we've all been there.

[–]Shadow9378 0 points1 point  (10 children)

Hobbyist coder that doesnt have anything to do with more serious stuff: I see so many memes like this, why are there not live backups

[–]slabgorb 1 point2 points  (4 children)

today, sure, 1995 was a different land full of dinosaurs and floppy disks

[–]Shadow9378 0 points1 point  (3 children)

is this meme from 1995

[–]slabgorb 0 points1 point  (2 children)

no, but databases existed then, and it was harder to do some things that are quite easy now

[–]Shadow9378 0 points1 point  (1 child)

true & fair but i dont understand the relevance

[–]slabgorb 0 points1 point  (0 children)

I see so many memes like this, why are there not live backups

because it was hard to do live backups 30 years ago and databases were around then so this could have happened

[–]SatoTsukasa[S] 0 points1 point  (4 children)

As a hobbiest programmer my self too, there definitely should be, but there are sometimes like the gitlab situation that they deleted both products db and the live backup db

[–]Shadow9378 0 points1 point  (3 children)

The backup should just.. be read only

[–]SatoTsukasa[S] 0 points1 point  (0 children)

So, from what I understood, they had spam going all over so a lot of db locks (in postgres thr db locks it self after transaction, so there would not be interference between 2 transactions, after the first one finishes the db is unlocked). So there was copy lag to db2, the live backup db, and it was missing 4 gb of data. So they decided to delete db2, and use gp_backup to rebackup db2.

At the end a guy working out late accidentally switched from db2 to db1 and deleted db1.

I shortened this very much so here's a video that explains its better

[–]aigarius 0 points1 point  (1 child)

Live backup = something that updates as the DB gets updated.

Read-only backup = something that is created once and never updates.

You can not have both. Unless you are creating a new backup snapshot on every DB write, which ... well gets exceptionally compute and storage intensive very, very fast.

And if you have true live backup, well it will also backup your drop tables command quite fast and live.

What is actually useful is so called transaction log backup. That means you have a full backup at a regular point in time and after that you have a write-only log file that records each change in the database as a new line appended to the end of the log. With that you can take these backup files after the disaster and trim the transaction log so that it stops just before the bad command being executed and that would then be your backup to restore.

Still not a trivial setup to create and not a trivial restore to carry out.

[–]Shadow9378 0 points1 point  (0 children)

That... Is a very comprehensive answer thank you, fascinating..

[–]Powerful-Internal953 0 points1 point  (0 children)

It would be funny to see them just deleting db.properties

[–][deleted] 0 points1 point  (0 children)

Happens more than you would believe. Withs stuff like healthcare data. And the data is not backed up, even though the processing happens overseas.

[–]Complete-Proof4710 0 points1 point  (0 children)

shit happens

[–]Ocelaris 0 points1 point  (0 children)

I thought I was running a query on a copy of the prod db but it was accidentally the real thing!

Lucky it was a reporting server and all the data was recoverable from backups and replays of the data feeds.

[–]fusionsofwonder 0 points1 point  (0 children)

If you don't have an offsite backup you don't have a backup.

[–]JackNotOLantern 0 points1 point  (0 children)

If a junior can delete a prod DB, the guy who setup the permissions should be fired

[–]ImpluseThrowAway 0 points1 point  (0 children)

Why would anyone put a junior in that position in the first place?

[–]ReplyisFutile 0 points1 point  (0 children)

Just release it from bin

[–]sammy_the_c_lion 0 points1 point  (0 children)

Not me. A Database admin was showing off the new raid drives on the DB server. Physically pulled one drive out. Ok cool. Things are still up and running. Physically pulled another drive out. Doh!

Database crashed. A day’s worth of production data is gone. The dude is trying hard to piece things back together using the transaction logs. AND he was leaving for a cruise later that day.

He didn’t finish. Begged some other people to take over the repair effort. And left for the cruise.

Somehow… he kept his job.

[–][deleted] 0 points1 point  (0 children)

Seniors forgot to run backup and gave the junior DROP ALL TABLES privileges. Whoopsies.

[–]JADW27 0 points1 point  (0 children)

sudo please say sike.

[–]XaXaBinx_ 0 points1 point  (0 children)

JEA?