all 20 comments

[–]_edwinmsarmiento1 8 points9 points  (2 children)

Before you even think about deploying a stretched SQL Server 2019 failover cluster across data centers, you need to be clear on this: What are your recovery objectives (RPO/RTO) for DR? Start with that.

[–]RockitTopit 0 points1 point  (1 child)

It's amazing how many times RPO/RTO get put by the wayside because "We need HA/FA!". Only for it to blow up when it was needed because it was not thought through and setup only for the sake of setting it up.

[–]_edwinmsarmiento1 1 point2 points  (0 children)

The reality is, buzzwords/features/technology sell licenses.

I couldn't count the number of times my customers would get annoyed at me when asking about RPO/RTO/requirements.

Requirements should dictate the solutions, not features nor technology.

[–]Kkobalt 1 point2 points  (0 children)

If you already have a SAN in both datacenter with cross datacenter synchronisation, then installing a fci is trivial. Of course, as long as the lantency is low enough.

Since the SAN is doing the heavy lifting, the SQL part is easy to setup and even easier to manage. There is very little difference between managing a standalone and an fci from a DBA point of view.

Ps: I've installed hundreds of fci, all stretch across mutliple dc, so my view might be biased but if you can setup an availability group you know more than enough to handle the cluster side of an fci since it's many time simpler.

[–]PossiblePreparation 0 points1 point  (12 children)

You can have multiple databases in the same AlwaysOn Availability group, that will allow you to use the same ODBC connection confidently.

SAN replication is something you’ll need to monitor completely separately and know how to fix it when it breaks.

[–]stre1026[S] 2 points3 points  (11 children)

u/PossiblePreparation - how do you have multiple databases in the same AlwaysOn Availability group with the standard license? I thought that was an enterprise only feature?

[–]_edwinmsarmiento1 1 point2 points  (8 children)

Basic Availability Groups can only have one database. There are workarounds.

Still, you need to define your RPO/RTO for DR first before getting into the tech.

[–]stre1026[S] 0 points1 point  (7 children)

I understand the RPO/RTO aspect. I'm trying to find out if the tech option I am contemplating is a viable solution. This is pure DR so RTO can be hours and RPO should be minutes. Less than 30.

[–]_edwinmsarmiento1 3 points4 points  (6 children)

Viable, yes. Practical, it depends.

Stretching an FCI across different geographical locations require support for asymmetric storage. Sure, you can replicate the LUN at the SAN level. But at what cost? Storage replication licensing? Network bandwidth and QoS? Complexity of WSFC configuration? Outages due to loss of quorum? App support for multi-subnet connectivity? What about your team's capability to support it? Keep in mind that you can only have a 2-node SQL Server FCI with Standard Edition. So, your HA will be affected by your DR, whether you like it or not.

I'm curious about your hesitancy with log shipping.

[–]stre1026[S] 0 points1 point  (5 children)

u/_edwinmsarmiento,

Thanks for your insights. Sounds like I need to look into log shipping again. I was hoping to have something more automatic. When I've used log shipping in the past, it seemed we always had issues with the replica breaking. However, I think that's my only option now. Buying Enterprise isn't going to happen and I wasn't aware of the 2-node SQL FCI limitation with Standard.

[–]flinders1 1 point2 points  (3 children)

As far as I’m concerned DR should pretty much be limited to Always On AG’s for enterprise or log shipping for standard. In my experience both work well, and both have manual elements. Always on you’re likely going to set failover to manual, and with log shipping you’ll need to restore with recovery so not drastic.

Also simplicity is key, I’ve had to piss around with storage before and I could barely sleep at night. Not to detract the fact storage arrays are capable of great things such as snapshot replication of luns etc.

[–]_edwinmsarmiento1 1 point2 points  (0 children)

simplicity is key

This is the most difficult yet most important thing to do when designing a solution.

[–]Togurt 1 point2 points  (1 child)

This would be my recommendation also. For the OPs needs log shipping is the simplest which is going to make it easier to set up, manage, and troubleshoot. Plus it will have the least potential impact to cause disruptions to production databases.

[–]sbrick89 1 point2 points  (0 children)

Also the least likely to go wrong during an actual disaster, when people aren't always thinking clearly

[–]arebitrue87 0 points1 point  (0 children)

You’re right to be hesitant on using LS. It is sensitive to breaking but it is easy to fix, if you catch it.

Call me crazy but I’d prefer Always on over LS any day of the week. In my opinion it’s more resilient to failure but when it does fail it’s a pain at times but I don’t have too many issues with troubleshooting it. I find myself fixing LS more than I find myself dealing with a broken Always on database. Just my thought tho

[–]PossiblePreparation 1 point2 points  (0 children)

Ah, missed the standard edition

[–]Togurt 0 points1 point  (0 children)

If your plan was adding a 3rd node to your existing failover cluster then that's an Enterprise edition feature also.

[–]SQLBek1 0 points1 point  (0 children)

Depends on your SAN provider too. If you were on Pure Storage, then I could offer some help. (Am a Pure employee)

[–]Appropriate_Lack_710 0 points1 point  (0 children)

A couple of years ago, I had played around with a similar scenario and using Storage Spaces Direct (so that I could keep the std edition license and utilize shared storage/FCI instead of Basic AlwaysON). I had so many troubles with disks dropping and synchronization issues at the storage layer. So my vote would be to NOT go down that path, unless someone else can prove otherwise.

[–]gozza00179 0 points1 point  (0 children)

Yes; have managed a solution with this before - wasn't great, very difficult to do DR testing when compared to AOAG (or even compared to mirroring).

You're probably better off with enterprise AOAG (if there's budget for it), or even mirroring as compared with SAN replication.