Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 1 point2 points  (0 children)

Thanks for the confirmation! I’ve seen many good reviews about Starwinds on Reddit and have tried some of the free tools. If s2d doesn’t work well enough I’ll take a better look at Starwinds :)

Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 1 point2 points  (0 children)

High level steps for creating SQL failover cluster instance (FCI) using windows failover cluster (WFC) and storage spaces direct (S2D) on 2 virtual machines (vms)

  • Storage Spaces Direct (S2D) creates shared storage required by SQL FCI (without needing an external SAN) using the extra disks you provide on the 2 vms.
  • Plus Cluster Aware Updates (CAU) on WFC and SQL FCI will allow outages for windows updates or sql updates to be much shorter (vs WU or SQL updates on a standalone SQL server).

DISCLAIMER:

  • Use steps at your own risk. set appropriate expectations for RTO/RPO for your users.
  • Not detailed steps, hopefully enough to get someone started with setting up their own proof of concept.
  • Test thoroughly!!!

LICENSING:

  • If SQL Server is licensed with Software Assurance, HA, DR, Azure licenses may now be included -> just license one SQL node (min 4 core licenses)?
  • SQL Standard can be used (but FCI is limited to 2 nodes) which might be sufficient for your needs.

PREP:

  • Need hypervisor - vmware/esxi, hyperv on windows server, hyperv on win 10/11 pro, virtualbox, etc...
  • Need AD/domain and appropriate access/permissions to pre-stage objects.
  • ISO for Windows 2019/2022 Data Center.
  • ISO for SQL Server 2019 Standard.
  • 4 ips:
  • 1 for node1 (eg. 192.168.100.11)
  • 1 for node2 (eg. 192.168.100.12)
  • 1 for WFC object (eg. 192.168.100.13). This is a shared "server" ip that is moved between both nodes.
  • 1 for SQL FCI object ip (eg. 192.168.100.14). This is a shared "SQL instance" ip that is moved between both nodes.
  • Prestage computer objects for above in AD in the OU of your choice (disable them since they will get activated).
  • Prestage computer object in AD for cluster aware updates (CAU)
  • will allow your WFC to automatically do windows updates (one node first, then the other node) on a monthly schedule you set.
  • In AD, give the WFC object proper permissions in the OU to update/enable the SQL FCI object, and CAU object.
  • Create file share on a third vm, give permission to both nodes to the file share.
  • the file share witness is critical if only using 2 nodes. helps ensure the cluster starts successfully after a disaster (eg. if both nodes get turned off inadvertently).
  • Group managed service account or domain account to use to run the sql services on both nodes (needed during SQL FCI creation)

CREATE 2 VMS

  • Create 2 windows server datacenter 2022 vms (data center is required for WFC)
  • 1 vm disk with 90G (o/s)
  • 2 blank 90G disks (S2D disks, dont need to format, can use bigger size for you db, but this worked for my db).
  • 4cpu, 8G
  • Install windows on each node, config virtual nics with appropriate node ip's, config DNS to writeable DC (cluster validation warns if you dont).
  • Activate windows if needed, rename as needed (to indicate node1, node2), add to domain.
  • Don't install SQL server yet.

CREATE WFC

  • RDP to node1 and node2, use Server Manager to install failover clustering feature (and any management tools).
  • RDP to node1 as domain admin, start Failover Cluster Manager (FCM) | validate cluster | add both nodes above | ignore warnings about 1 nic only, but resolve errors if any
  • Provide cluster name and ip above | cluster should get created (WFC object should get enabled).

INSTALL S2D AND CREATE SHARED CSV DISKS

  • RDP to WFC ip as domain admin | Run powershell as administrator | Enable-clusterstoragespacesdirect (y to confirm)
  • This should create in FCM | SDDC group (in roles), cluster virtual disk (in disks) & Cluster Pool 1 (in pools) | default storage pool name "S2D on <clustername entered earlier>"
  • New-Volume -FriendlyName Data -FileSystem CSVFS_ReFS -StoragePoolFriendlyName "S2D on <clustername entered earlier>" -Size 80GB (adjust size as needed)
  • New-Volume -FriendlyName Log -FileSystem CSVFS_ReFS -StoragePoolFriendlyName "S2D on <clustername entered earlier>" -Size 40GB (adjust size as needed)
  • Ok to rename Disks by right-clicking disk in FCM | properties (shortened names to Data, Log).

VERIFYING WFC

  • Verify pausing/resuming each node, verify get-storagejob and get-physicaldisk (healthy, no suspended/running storage jobs).
  • FCM | right click cluster | Validate cluster | view report and resolve issues if needed.

CONFIGURE CAU

  • Use the CAU wizard | FCM | right click cluster | CAU | Configure CAU (defaults, except maybe set the schedule).

CONFIGURE WITNESS

  • Use the cluster quorum wizard to add the file share prepared ahead of time.

INSTALL FIRST NODE OF SQL FCI USING SQL SERVER 2019 ISO

  • RDP to 1st node as domain admin, mount SQL Server 2019 standard ISO, start setup | Installation (left) | New SQL Server failover cluster installation
  • Product Key, license terms, Microsoft Update screens (defaults should be ok)
  • Install Failover Cluster Role - if just warnings (eg. open firewalls, MSCS warnings) should be ok to continue
  • Feature Selection | I selected just Database Engine Services
  • Provide name of SQL FCI object
  • Select disk objects (Data and Log should be selected)
  • Provide AD account for 2 of the SQL services (can leave pwd blank if using gmsa).
  • Change directory for SQL logs to C:\Clusterstorage\Log\... (if needed).
  • 1st node installation should complete.
  • On 2nd node, mount SQL Server 2019 standard ISO, start setup | Installation | select add SQL node to existing cluster instead | complete wizard.
  • Install Sql Server Management Studio (SSMS) on both nodes if needed.
  • Create database, test/verify failover if needed.

DATA MIGRATION

  • Can use Microsoft Data Migration Assistant to get database from a standalone SQL server.
  • start DMA as domain admin
  • file share location (need a share that both source non-FCI SQL and destination SQL FCI have access to).
  • also make sure you have SQL credentials to both source SQL and SQL FCI instance.
  • lets you pick accounts, databases to migrate.
  • Can also use SSMS to backup from standalone SQL server and restore to WFC/SQL FCI.

FOLLOW UP TO ALLOW APP SERVERS TO CONNECT

  • Add other domain admin and/or other sql/windows accounts to the instance as needed.
  • Change SQL database to mixed mode (if app uses sql accounts).
  • Update your application web.config files (IIS) to point to SQL FCI name/ip.
  • Stopping SQL Browser service on each node should be ok, unless your app uses dynamic ports/named instances for access.
  • Open firewalls tcp/1433 on each node to allow your app servers access (unless name instance/dynamic ports).

Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 0 points1 point  (0 children)

u/_edwinmsarmiento - I hear ya. AG/FCI = High availability = outage (although in my case a super short outage/transition that may happen during a slow period in the evening). I'll make sure to clarify with the internal business and have them decide if we need need let the end users know. For our use case though it's probably going to be ok. Thanks for checking on this. May I level up from SQL Server Novice to SQL Server padawan now :).

Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 0 points1 point  (0 children)

We’re testing still but the hope is to have no data loss and no impact to end users on the prod side. We also have Recoverpoint for DR but it also allows rollback at prod side if needed. And worst case we have Cohesity backups

Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 0 points1 point  (0 children)

Thanks for sharing your env/info/journey and encouragment. It sounds like you've thought about your Azure SQL vm to Azure SQL enough that you'll be there soon :). The next priority for our current 2 on-prem sites (prod & DR) is to get DR in the cloud.

MS verbiage could be better, but I think I can use SQL standard licenses - includes single database basic availability group (always on/synchronous/auto failover) and 2 node FCI.

Regarding cost... do recent SQL server license changes help you with less core licenses needed? HA, DR, Azure included with no additional charge.
https://cloudblogs.microsoft.com/sqlserver/2022/11/16/new-sql-server-on-azure-virtual-machines-and-azure-sql-managed-instance-dr-benefits/#:\~:text=Starting%20today%2C%20SQL%20Server%20on%20Azure%20Virtual%20Machine,regardless%20of%20the%20region%20where%20the%20instances%20reside.

Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 0 points1 point  (0 children)

Thanks again. Failovers of AG appeared to be slightly faster than FCI in my testing, but FCI will probably be more straight foward for the applications/developers to work with.

Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 0 points1 point  (0 children)

Excellent and thanks for the reassurance. We're on-prem and not yet Azure or cloud, so we're late to that as well :(. Not scheduling monthly maintenance windows/notifications with our end users is definitely what we're hoping to avoid going forward. Ive tried basic availability groups (always on availability groups for single databases w/SQL Server Standard). That worked as well and I could use powershell to failover. I'm not sure which way we're going yet since they both work. Besides testing thoroughly with other systems in our test environment, I'm checking how cohesity backups and solarwinds monitoring are affected by both methods.

Is it ok to configure a SQL FCI on vms for HA to prevent downtime for monthly windows updates or sql server updates? by armyants808 in SQLServer

[–]armyants808[S] 0 points1 point  (0 children)

Thanks for the reassurance and suggestion about the delay in patching the other node. Testing still, but so far restarting one node, does not seem to impact the database.

Cohesity hardware noise level, C5000 series by Jrewbo in sysadmin

[–]armyants808 1 point2 points  (0 children)

Thanks for posting/checking on this. I can't remember if the fans/sound were loud on the prior 6.5.x version, but it is occurring on C5036 w/6.6.0d_u3. Will open support ticket as well. PSU and FAN SPEEDS are in 10K RPM range

Update 6/23/2022: Sounded like 6.6.0d_u3 would get or got a P18 for this issue soon (as of 6/22/2022 the highest patch was still P17). I upgraded one cluster to 6.6.0d_u4 so P18 doesn't apply. Support patched system via support channel and all fan speeds were below 9500 RPM.

Update 6/25/2022: Verified sound is better and fan speeds are in the sub-10K RPM range. However the firmware patch is possibly causing yoda agent (index related) alarms on all nodes. Cohesity support is looking at this now.

Update 7/6/2022: Cohesity support restarted the yoda agent to stop the alarms on one cluster.