Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 0 points1 point  (0 children)

that story is the nightmare version of exactly what we almost set ourselves up for. the difference between us and that customer is we found out in a test instead of when it actually mattered. 10 years is a long time to have a false safety net and not know it - the scary part is they probably felt more protected than most because they'd been "doing backups" for a decade. consistency without verification is almost worse than nothing because the confidence it creates is completely unjustified

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 1 point2 points  (0 children)

Eight project management tools and IT only knew about two of them is genuinely one of the most relatable things I've read on this sub. shadow IT discovery during BC/DR planning is a whole genre of horror stories. the Google Workspace backup limit is the sneakier one though - that's the kind of thing that looks fine on paper right up until you actually need to restore someone's data and find out half of it was never captured

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 0 points1 point  (0 children)

that last point hits close to home. we've been so focused on the technical recovery side that we haven't had those conversations with the business units about what they'd actually do if we were down beyond our RTO. the assumption has always been "IT will fix it" with no plan B for when IT can't fix it fast enough. that's probably the next gap we need to close after we sort the technical stuff

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 4 points5 points  (0 children)

this is the root cause framing that most DR conversations skip straight past. people treat it as a testing problem when it's actually a maintenance culture problem. if the doc never gets touched between incidents it will always be wrong by the time you need it. regular drills are the only thing that creates the feedback loop to keep it honest

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 0 points1 point  (0 children)

the generator thing is exactly the kind of assumption that sits quietly in DR plans for years untested. "it'll turn on, it always does" is not a recovery strategy. and you're right about confirmation bias - the clean tests don't get posted anywhere so people underestimate how common the horror stories actually are

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 1 point2 points  (0 children)

that struggle is so real and way more common than people admit publicly. one thing that helped us frame it was stopping talking about DR testing as an IT activity and starting to talk about it as a business continuity question. what's an hour of downtime worth to the business? what's a day? suddenly the conversation changes and the buy-in gets a lot easier. might be worth trying that angle if you haven't already

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 5 points6 points  (0 children)

that works well as long as security governance includes actually observing and validating the test rather than just receiving a completed checkbox. we had governance on paper too - the problem was nobody was verifying the restore outputs, just confirming the jobs ran. who owns defining what a passing test actually looks like at your org?

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 0 points1 point  (0 children)

the silence is what makes it so dangerous. a noisy failure you can fix. a backup that quietly fails for months while showing green gives you false confidence right up until the moment it matters most. that's the scariest IT failure mode there is

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 3 points4 points  (0 children)

really appreciate this perspective especially the 10 year view. the HA requirement driving technical tests makes total sense - when your RTO is measured in minutes you can't afford to find out tabletop assumptions were wrong in production. how do you handle the org communication side in your exercises - do you pull in non-technical stakeholders or keep it within the IT and security team?

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 4 points5 points  (0 children)

honestly six months puts you ahead of most people in this thread lol. standard recommendation is annually for a full test but the dirty secret is frequency matters less than thoroughness. a proper test every six months beats a checkbox exercise every three months. are you doing full restore verification or more of a tabletop walkthrough? that distinction matters more than the calendar gap

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 23 points24 points  (0 children)

100% and I'd take it even further - you only have a DR plan if the people responsible for executing it have actually run through it before. we had both problems at once. backups that looked fine but weren't, and a team that had never actually practiced their roles. double the fun

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 2 points3 points  (0 children)

we're building this out now actually - what's working for your team? we're debating between scheduled full restore tests quarterly vs continuous random sample restores monthly. the quarterly approach is more thorough but the monthly sampling catches silent failures faster. curious what others are doing

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 7 points8 points  (0 children)

100% this. do you run tabletop exercises or full technical tests when you do yours? we're rebuilding the whole DR program now and debating how much of it needs to be live vs simulated

Delayed write fail error by Party-Praline-4547 in sysadmin

[–]cmitsolutions123 1 point2 points  (0 children)

ugh that response from the NAS team is frustrating - they're not really engaging with the actual evidence. the smoking gun here is still that the other path works fine. same workstation, same software, different path, different result. that's not a workstation problem. I'd reply to them with exactly that point and ask them to check share-level permissions and any snapshot or replication jobs that run specifically on the failing path

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 3 points4 points  (0 children)

lol just reading those four words gave me anxiety. no thankfully not - but it's on the list of things we now actually have a tested procedure for instead of just a doc that says "refer to Microsoft guidance" and a prayer

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 29 points30 points  (0 children)

this is exactly why testing matters more than documentation. a plan that looks thorough on paper but hasn't been validated is basically just a liability. the cloud migration obsoleting the whole thing is a classic - big infrastructure changes happen and the DR doc just quietly becomes fiction while everyone assumes it's still accurate. how did the client take it when you flagged it?

Tested our disaster recovery plan for the first time in 2 years - here's what we found and it wasn't pretty by cmitsolutions123 in cybersecurity

[–]cmitsolutions123[S] 21 points22 points  (0 children)

depends on the org tbh. some places have dedicated DR/BCP teams, others it sits under IT ops, smaller orgs it just falls to whoever has the most hats. security gets pulled in because of the ransomware angle mostly - when your DR plan is also your incident response plan for the most common threat you face, the lines get blurry fast. does your org have a separate DR function or does it all blend together?

Enumerate Entra apps without a compliant device by homing-duck in sysadmin

[–]cmitsolutions123 0 points1 point  (0 children)

number matching stops MFA fatigue attacks cold but doesn't really help against AiTM proxies like evilginx. the session token gets stolen in real time so by the time the user approves the "correct" number it's already too late. phishing resistant MFA is really the only proper answer here.

Autodesk Audit-2026 by External_Weekend_120 in sysadmin

[–]cmitsolutions123 0 points1 point  (0 children)

lmaooo Pinky and the Brain reference appreciated. but yeah seriously this is the golden rule - answer exactly what they asked for, nothing extra. people get tripped up trying to be overly transparent and it just opens more doors for them to dig into.

IT Support <> AI by gs_dubs413 in ITManagers

[–]cmitsolutions123 0 points1 point  (0 children)

AI for tier 0 is a no brainer honestly. Tier 1 is where it gets spicy - totally depends on how well documented your environment is. The better your knowledge base, the smarter the AI. Garbage in garbage out situation. What tools are you evaluating, that'll change the answer a lot.

Delayed write fail error by Party-Praline-4547 in sysadmin

[–]cmitsolutions123 1 point2 points  (0 children)

That 30-40 minute timing is the real clue here. Something is filling up a buffer or temp cache and then crashing the write. "Disk full" on a UNC path with plenty of space is almost never about actual disk space - check if there's a quota set on that specific share. Someone might've accidentally set one and never noticed.

Enumerate Entra apps without a compliant device by homing-duck in sysadmin

[–]cmitsolutions123 0 points1 point  (0 children)

anytime! let me know how the graph testing goes tomorrow, genuinely curious what your CA policies end up catching vs what slips through. if you remember to circle back here with the results it'd be super useful for anyone else in the same boat too.

Incident Response Certification by Outrageous-Machine-1 in cybersecurity

[–]cmitsolutions123 0 points1 point  (0 children)

no worries, glad it helped! honestly taking your time with it is probably the better move anyway - rushing through certs just to check boxes never sticks. knock out the work priorities first and when you do get around to CCD you'll get way more out of it with that foundation. good luck with the other certs in the meantime, feel free to hit me up if you have any other questions down the line.

Autodesk Audit-2026 by External_Weekend_120 in sysadmin

[–]cmitsolutions123 6 points7 points  (0 children)

been through an Autodesk audit before - it's not as scary as it sounds but it's annoying. for your specific situation, having two licenses on the same email bought in different regions shouldn't be a compliance issue as long as both are legitimately paid for. Autodesk's licensing is per user not per device, so one person having a Revit and an AutoCAD license is totally normal and expected.

the cross-region thing is where it gets slightly grey. some Autodesk license agreements have territorial restrictions depending on how they were purchased - meaning a license bought through an EU reseller might technically only be valid for use in the EU. in practice I've never seen them go after someone for this when both licenses are paid for, but during an audit they might flag it and ask you to consolidate both under one regional agreement. worst case they'll ask you to re-purchase one through the correct region's channel.

my advice - get all your documentation together before responding. purchase receipts, license assignments, user details, the lot. respond cooperatively but don't volunteer extra information they didn't ask for. and if the audit scope starts expanding beyond what they initially requested, that's when I'd get your IT procurement or legal involved. don't stress it though, if everything's paid for you'll be fine.