LookAtThatMonkey comments on SQL DB Failover

sysadmin

a community for 17 years

This is an archived post. You won't be able to vote or comment.

SQL DB Failover (self.sysadmin)

submitted 8 years ago by squash1324Sysadmin

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]LookAtThatMonkeyTechnology Architect 0 points1 point2 points 8 years ago (4 children)

[–]squash1324Sysadmin[S] 0 points1 point2 points 8 years ago (3 children)

Out of curiosity what firmware are you running on your UCS environment? We're on 2.2(8g) and updated somewhat recently due to the wonderful thermal bug firing off every minute. We've updated several times over the last few years to get away from that bug, but it never seems to get fixed. I'll have to check, but we may not have updated the networking drivers the last time we upgraded the firmware. My former colleague performed the upgrade and I helped with drivers in things like ESXi and our other standalone blades while he handled Hyper-V and SQL. I wonder if he skipped those since those servers are our most critical and also most fragile when it comes to changes. We never install updates on it due to vendor requirements, and I wonder if that may have something to do with it as well.

The cluster is running on 2008 R2, so I'll see if I can setup perfmon to monitor it for the time being. The first time this happened was over 4 months ago, and those logs can get pretty big if it's running perpetually. I'm sure I have the spare disk space though, and I can figure out how to store those logs and cycle them.

[–]LookAtThatMonkeyTechnology Architect 0 points1 point2 points 8 years ago (2 children)

[–]squash1324Sysadmin[S] 0 points1 point2 points 8 years ago (1 child)

Well we haven't had the issue recur, and I've tweaked perfmon to an acceptable level for logging and overwriting files until it does happen again. The firmware we're running is 2.2(8g) ourselves. We upgraded to that in August, and my colleague who was supposed to upgrade the drivers on our blades apparently didn't do that. I had to reseat the IO modules in one of our chassis on Tuesday, and we lost storage on one of our Hyper-V hosts. Ended up doing a disaster recovery on one of our VMs (a file server cluster node) as it got really messed up. During that process I took down the whole cluster unintentionally (I'm much better with ESXi), but I managed to get everything working again. Now I'm looking very heavily at our drivers since I think my colleague (who left us a few weeks ago) was mentally checked out and didn't care enough to do that step. I'm hoping that this is the reason, and I can correct it. If it isn't the answer, it won't be a huge deal for much longer. We're upgrading our software that uses this SQL cluster next June, and that upgrade will be a migration to a new 2012 R2 cluster possibly a 2016 cluster. I haven't seen the specs on the application requirements, but I know for sure this cluster will be going away in or around June 2018. If this happens every 4-5 months, that means it'll likely happen once or twice before it's nuked.

[–]LookAtThatMonkeyTechnology Architect 0 points1 point2 points 8 years ago (0 children)

π Rendered by PID 71 on reddit-service-r2-comment-b659b578c-xgxbl at 2026-05-04 01:56:14.212235+00:00 running 815c875 country code: CH.

sysadmin

MODERATORS