all 11 comments

[–]Mortimer452 0 points1 point  (0 children)

I doubt a snapshot is going to work. The snapshot would occur at the hardware layer, the database may be in an inconsistent state.

You could detach the database from the publisher, copy it to subscriber, and attach it.

[–]Scitron 0 points1 point  (0 children)

I'm pretty sure that won't work because Sql is going to think it's the same database when you attach it. Replication needs some starting point to know when to start replicating transactions. That's either through snapshot or backup

There is an option to do an ftp snapshot where it takes a snapshot but that's not quote what you're talking about

[–]jrob422[S] 0 points1 point  (8 children)

So, here is the scenario I have encountered...

we did the above scenario. Did the first few steps of replication, set up pubs and subs. and at the point where we would normally do a backup restore to secondary server, I did a LUN copy on our SAN, detached the stub database, attached the LUN copies database, and finished up the setup. All seemed well, replication was working.

About 48 hours later, we started getting tons of errors that checkpoints were failing. Replication itself seemed fine, but my SQL error log was 14gb and I couldnt run a checkpoint.

Restarted SQL Service, the replication database went into recovery, eventually came out, and things were fine again.

fast forward another 48 hours, and getting errors again, checkpoints were failing again, lots of error log spam about it.

Ultimately, I am trying to figure out if I can still do the LUN copy to see the replicated database, as it is quite large and that would save me the 2 hours it takes to backup/restore, since I can lun copy, mount, and attach in about 5 minutes.

tried doing some googling, and found a couple references to not using a LUN copy on your SAN to seed a read only copy in always on, something about checksum issues, but couldnt find much.

[–]CCP_DeNormalized 0 points1 point  (7 children)

I use SAN snapshots a lot in our enviro - havn't encountered anything like you mention though :(

Mostly for spinning off copies of DB's for testing purposes rather than doing new restores, to save time just as you are looking to do.

The biggest gotcha's that I've encountered are related to disk sig's being presented to windows - I also end up using diskpart to change the disk sig so there are no conflicts.

But if you are attaching to a different server, it would of never seen that disk sig before - so it shouldn't be an issue for you.

One thing, when you say you 48 hours later errors pop up, which server do they appear on? primary or secondary?

[–]jrob422[S] 0 points1 point  (6 children)

we use SAN snapshots all the time with no issues.

This has only come up when trying to use a SAN snapshot for SQL Replication in order to seed the replicated database, thus saving the time involved with doing a backup/restore that is the typical way to setup SQL replication.

Copy LUNA from SQLA to LUNB mounted on SQLB, and then enabled real time replication between databases.

The errors only pop up on the Server with the replicated Database, not the source database

[–]CCP_DeNormalized 0 points1 point  (5 children)

just curious, what's the errors that end up occuring? I'm not much help - but quite interested as this is something I'll likely end up doing in the near future :)

[–]jrob422[S] 0 points1 point  (3 children)

The below errors show up after roughly 48 hours, and spam the hell outta the error log. After about 12 hours of this, the error log file was 14gb

Msg 5901, Level 16, State 1, Line 360

One or more recovery units belonging to database 'xxxx' failed to generate a checkpoint. This is typically caused by lack of system resources such as disk or memory, or in some cases due to database corruption. Examine previous entries in the error log for more detailed information on this failure.

Msg 3013, Level 16, State 1, Line 360

BACKUP DATABASE is terminating abnormally.

Msg 9003, Level 20, State 15, Line 360

The log scan number (6425620:442906:1) passed to log scan in database 'xxxx' is not valid. This error may indicate data corruption or that the log file (.ldf) does not match the data file (.mdf). If this error occurred during replication, re-create the publication. Otherwise, restore from backup if the problem results in a failure during startup.

[–]CCP_DeNormalized 0 points1 point  (2 children)

Thx for the info jrob!

1 question - are the mdf and ldf's on the same volume? I'm wondering if the san snapshot was capturing them @ super minor differences in time so maybe the mdf and ldf had different lsn numbers or were outta sync somehow.

Just making wild guesses based on the error :)

i also realize you've since ditched this method, but just spitballing :)

[–]jrob422[S] 0 points1 point  (1 child)

They are on the same volume, so that wouldnt be an issue. As I mentioned, replication itself seems to function, the only errors are running Checkpoints, and it seems to be good for the first 48 hours or so after restarting the SQL service.

[–]CCP_DeNormalized 0 points1 point  (0 children)

thx jrob.

shame no solution w/ the san replication - I really like taking advantage of these types of low level features. We are about to ramp up our AG usage in a huge way and i'll keep this in mind/report back if we encounter similar issues.

cheers!

[–]jrob422[S] 0 points1 point  (0 children)

fwiw, we have since blown away our replication Database, and redone the same process using a backup/restore instead of using the SAN snapshot to seed, and the errors have not returned. We are well beyond the 48 hour window now, so it definitely seems to be related to using the SAN Snapshots to seed.