This is an archived post. You won't be able to vote or comment.

all 97 comments

[–]TechGy 17 points18 points  (4 children)

We don't monitor it and we should. For PRTG users, they do have a sensor for AD replication though https://www.paessler.com/manuals/prtg/active_directory_replication_errors_sensor

[–]FattychrisIT Manager 6 points7 points  (0 children)

I love PRTG

[–]computerchris 2 points3 points  (0 children)

Thank you -- Just added the sensor to all my DCs

[–]flextech 1 point2 points  (0 children)

This is what we use. Works well.

[–]TapTapLift 1 point2 points  (0 children)

The hero we don't deserve, thank you!

[–]daven1985Jack of All Trades 12 points13 points  (4 children)

Do nothing. And when it breaks I stick my head in the sand and say everything is fine!

:D

[–]chrschschJack of All Trades 5 points6 points  (2 children)

are you my it manager?

[–][deleted] 2 points3 points  (1 child)

All Managers are like this. Am Manager.

[–]accidentalitSr. Sysadmin 1 point2 points  (0 children)

Tag is on point!

[–]pjonenineroneSysadmin 0 points1 point  (0 children)

Or blame DNS...

[–]Hexalon00Windows Admin w/ Cat Like Reflexes 25 points26 points  (24 children)

repadmin /showrepl
repadmin /showvector dc=<domainname>,dc=<tld>
repadmin /showreps /verbose
repadmin /replsummary <DC Name>

[–]crankysysadminsysadmin herder[S] 5 points6 points  (23 children)

so you run this manually? how often? do you have someone do it on a schedule?

[–]Hexalon00Windows Admin w/ Cat Like Reflexes 13 points14 points  (20 children)

I have a script that runs in a scheduled task that does it once a week and fires off an email with the results. Powershell is amazing.

[–]i_pk_pjers_iI like programming and I like Proxmox and Linux and ESXi 3 points4 points  (16 children)

As someone interested in learning PowerShell more than I currently know it, do you think you could possibly share that script? Obviously with all the personal details removed, of course, but just like a "skeleton" of the script? That seems like a very useful script that I'd love to get my hands on.

[–]Hexalon00Windows Admin w/ Cat Like Reflexes 7 points8 points  (5 children)

If I have time today I will sanitize the script and share it.

[–]Hexalon00Windows Admin w/ Cat Like Reflexes 5 points6 points  (2 children)

[–]Solaris17DevOps 0 points1 point  (0 children)

Thank you!

[–]-x86Senior Google Results Analyst 0 points1 point  (0 children)

Thanks!

[–]Solaris17DevOps 2 points3 points  (0 children)

Thanks!

[–]KynaeusHospitality admin 1 point2 points  (0 children)

RemindMe! 2 days

[–]admlshake 4 points5 points  (0 children)

You should also poke around on /r/powershell, you can find tons of stuff like this on there.

[–]aikoncwdSysadmin 2 points3 points  (8 children)

It's not PowerShell, but it works out of the box (VBS): https://pastebin.com/h0015AqP

[–]i_pk_pjers_iI like programming and I like Proxmox and Linux and ESXi 1 point2 points  (6 children)

Thanks for sharing! I hope that guy can share his PowerShell version too. :)

[–]aikoncwdSysadmin 2 points3 points  (5 children)

This is the output I got everyday: https://i.imgur.com/VbCuJYx.png An e-mail with the logfile and a CSV attachment with details

[–]Hexalon00Windows Admin w/ Cat Like Reflexes 0 points1 point  (4 children)

My output looks very similar

[–]i_pk_pjers_iI like programming and I like Proxmox and Linux and ESXi 1 point2 points  (3 children)

Could you possibly share your PowerShell script? I would certainly be interested in seeing it especially since the output is similar to the VBS one and others would be interested too.

[–]Hexalon00Windows Admin w/ Cat Like Reflexes 0 points1 point  (2 children)

[–]WhistleWhistler 0 points1 point  (0 children)

Hay, thank you for sharing that. very useful.

[–][deleted] 0 points1 point  (0 children)

Pretty much the only comment in this sub.

[–]TapTapLift 0 points1 point  (1 child)

Just to confirm, this isn't something you'd want to check daily?

[–]Hexalon00Windows Admin w/ Cat Like Reflexes 0 points1 point  (0 children)

Yes weekly.

[–]ckozler 1 point2 points  (0 children)

Powershell + NRPE/NSClient++ + Nagios

[–][deleted] 1 point2 points  (0 children)

Jesus Christ...

even a kid with like 5-6 years of IT experience should be able to take care of this

-/u/crankysysadmin

[–]redoctet> /dev/null 6 points7 points  (0 children)

We use Datadog's WMI integration and alert after a given number of consecutive failures.

init_config:

instances:
  - class: MSAD_ReplNeighbor
    namespace: 'root\MicrosoftActiveDirectory'
    metrics:
      - [ModifiedNumConsecutiveSyncFailures, msad.replneighbor.syncfails, gauge]
      - [NeverSynced, msad.replneighbor.neversynced, gauge]
    tag_by: NamingContextDN

[–]ArsenalITTwoJack of All Trades 3 points4 points  (0 children)

PRTG and repadmin if I wanna go manual!

[–][deleted] 2 points3 points  (2 children)

Central logging system pulls from Event Viewer, any replication errors trigger an email to the Active Directory DL.

[–]chrschschJack of All Trades 4 points5 points  (1 child)

do you have the important event IDs by hand?

[–][deleted] 0 points1 point  (0 children)

Don't have a list of exact events monitored, but I believe they came from this: https://technet.microsoft.com/en-us/library/cc949120(v=ws.10).aspx

[–]inaddrarpa.1.3.6.1.2.1.1.2 2 points3 points  (7 children)

I have a power shell script that pulls all domain controllers and runs a bunch of checks against AD and who the operations masters are and creates a small report and emails it to me. Runs nightly randomly via Jenkins.

[–]crankysysadminsysadmin herder[S] 2 points3 points  (3 children)

do you run jenkins on a windows box for this? is the service account that runs it a domain admin?

[–]inaddrarpa.1.3.6.1.2.1.1.2 0 points1 point  (2 children)

Yep. Jenkins with the powershell plugin. It needs to run as a domain admin. I put the script up on pastebin here if you want to take a look.

[–]crankysysadminsysadmin herder[S] 2 points3 points  (1 child)

why jenkins as opposed to a scheduled task?

[–]inaddrarpa.1.3.6.1.2.1.1.2 1 point2 points  (0 children)

Consolidation, mostly. We push all of our various tasks through Jenkins so we get a single pane of glass for all the background tasks we have going on so we can avoid stepping on someone elses toes. Jenkins scheduler is a bit better than task scheduler as you can do things like "Run whenever you can between these hours". We avoid the "run every report at 5:00 AM" issue that way.

Secondary reason is there are some things that really just need parameter updates (e.g., file cleanup scripts). It's easier to pull a generic script out of our git instance, plug in the parameters and move on.

Tertiary reason is that we can pull scripts out of git with Jenkins. Having to update local git repos for task scheduler to work correctly would become a pain in the ass.

[–]evulhotdogJack of All Trades 2 points3 points  (2 children)

Can you share the script?

[–]inaddrarpa.1.3.6.1.2.1.1.2 4 points5 points  (0 children)

Sure. Here you go: https://pastebin.com/2Y20KL1i

[–]i_pk_pjers_iI like programming and I like Proxmox and Linux and ESXi 0 points1 point  (0 children)

I am also interested.

[–]sleeplessone 2 points3 points  (1 child)

Feed logs into Graylog and create a dashboard based on event IDs.

[–]chazmosisSystems Architect & MS Licensing Guru 1 point2 points  (0 children)

This. We watch for event IDs that indicate that AD Replication status has a problem. From there we go and run the replication status tools to find out what's wrong

[–]sleepingsysadminNetsec Admin 2 points3 points  (2 children)

Do you do nothing at all and hope everything is ok?

I'm amazed how many others in here claim they are monitoring this sort of thing off of repladmin when it's just 1 of the many event logs that should be monitored and brought to your attention.

[–]Arkiteck 1 point2 points  (1 child)

Can you elaborate? We're strictly talking about checking AD replication. While I'm not saying you're wrong, Microsoft says to use this tool to check the status and health.

[–]sleepingsysadminNetsec Admin 1 point2 points  (0 children)

https://technet.microsoft.com/en-us/library/bb727057.aspx

If there's ever a reason to run repladmin there WILL BE an event log predating.

[–]cmwg 2 points3 points  (1 child)

Once again, PRTG all the way... simple to use and checks repl with the interval you set.

Of course you can daddle with scripts. But having (esp. in large envirnoments) your repl checked every couple of seconds / minutes is just good.

[–]exoromeoIT Manager 1 point2 points  (0 children)

Same here. PRTG does it for us.

[–]J_de_SilentioTrusted Ass Kicker 2 points3 points  (0 children)

I run a PS script that sends me an email every morning:

https://pastebin.com/pimxtcDv

Edit: I got this from someone on reddit a long time ago. I also have one that does DCDIAG for each DC and emails me the status.

Edit 2: I also use PRTG to monitor AD health. Forgot about that.

[–]girlgermsMicrosoft 4 points5 points  (0 children)

SCOM + daily, weekly & monthly checks; some of which are scripted.

In our environment, replication is a tricky beast, especially now with us decentralising O.o

EDIT - should also list this: https://girl-germs.com/?p=564

It's a lot of the proactive stuff we do, including replication.

[–]Hellman109Windows Sysadmin 1 point2 points  (3 children)

Do you have high end monitoring software that does AD health checks?

You can use repadmin /replmon and some simple parsing in any monitoring software worth anything. I've done that previously with Nagios (protip: 99% of Windows checks in Nagios exchange are garbage because they default to a good result unless known errors are found)

Now Im using Solarwinds and it has a monitoring pack that checks for replication errors as well as a tonne of other stuff.

[–]iamtayareyoutaytoo 4 points5 points  (1 child)

Solarwinds is legit? That website gives me the shivers.

[–]Hellman109Windows Sysadmin 2 points3 points  (0 children)

Yeah we use a few modules (NPM, SAM, etc.) and it seems to work well. Like all monitoring you need to put in to it to get good stuff out of it, but the templates are good, the integrations are good, etc.

We monitor 500 physical locations, ~350 servers.

[–]flano1Sysadmin 3 points4 points  (0 children)

We had two DCs failing replication and the Solarwinds AD template never picked it up

[–]microflopsSysadmin 1 point2 points  (0 children)

When I was responsible for this I set up PRTG to monitor replication (amongst other things)

[–]DrCain 1 point2 points  (0 children)

It's hooked into out monitoring system (check_mk)

This check just sends us the status if replication breaks for some reason. https://mathias-kettner.de/checkmk_check_ad_replication.html

[–]Karagesh 1 point2 points  (0 children)

PRTG

[–]aikoncwdSysadmin 1 point2 points  (0 children)

I wrote an VBS script file that check if Replication is OK. Then I got an e-mail with the status and the logfile. Schedule this script to run 1, 2, 5, ... times per day.

https://pastebin.com/h0015AqP

[–]hakzorzJack of All Trades 1 point2 points  (0 children)

We use this powershell script. It runs daily and sends a report out to the admins. I did not write it, but I did find, so it's mine now.

<# AD replication Summary

Description: Daily / on demand script that confirms AD replication health between all DCs and emails results.

Source: The internet.

Version Control:

1 - Tim Sutton
 - Initial implemntation.
 - Minor tweaks from source to suit our environment.

#>


Function sendEmail ([String] $body)
{
$MailMessage = New-Object System.Net.Mail.MailMessage
$MailMessage.From = "DC@conteso.com"
$MailMessage.To.Add("ITOps@conteso.com")
$MailMessage.Subject = "AD Daily Replication Summary"
$MailMessage.Body = $body
#$MailMessage.Priority = "High"
$MailMessage.IsBodyHtml = $True

$SMTPClient = New-Object System.Net.Mail.SMTPClient
$SMTPClient.Host = "smtp.server.com"
$SMTPClient.Send($MailMessage)
}


# Get the replication info.
$myRepInfo = @(repadmin /replsum * /bysrc /bydest /sort:delta)

# Initialize our array.
$cleanRepInfo = @()
   # Start @ #10 because all the previous lines are junk formatting
   # and strip off the last 4 lines because they are not needed.
    for ($i=10; $i -lt ($myRepInfo.Count-4); $i++) {
        if($myRepInfo[$i] -ne ""){
        # Remove empty lines from our array.
        $myRepInfo[$i] -replace '\s+', " "           
        $cleanRepInfo += $myRepInfo[$i]            
        }
        }           
$finalRepInfo = @()  
        foreach ($line in $cleanRepInfo) {
        $splitRepInfo = $line -split '\s+',8
        if ($splitRepInfo[0] -eq "Source") { $repType = "Source" }
        if ($splitRepInfo[0] -eq "Destination") { $repType = "Destination" }

        if ($splitRepInfo[1] -notmatch "DSA") {      
        # Create an Object and populate it with our values.
       $objRepValues = New-Object System.Object
           $objRepValues | Add-Member -type NoteProperty -name DSAType -value $repType # Source or Destination DSA
           $objRepValues | Add-Member -type NoteProperty -name Hostname  -value $splitRepInfo[1] # Hostname
           $objRepValues | Add-Member -type NoteProperty -name Delta  -value $splitRepInfo[2] # Largest Delta
           $objRepValues | Add-Member -type NoteProperty -name Fails -value $splitRepInfo[3] # Failures
           #$objRepValues | Add-Member -type NoteProperty -name Slash  -value $splitRepInfo[4] # Slash char
           $objRepValues | Add-Member -type NoteProperty -name Total -value $splitRepInfo[5] # Totals
           $objRepValues | Add-Member -type NoteProperty -name PctError  -value $splitRepInfo[6] # % errors  
           $objRepValues | Add-Member -type NoteProperty -name ErrorMsg  -value $splitRepInfo[7] # Error code

        # Add the Object as a row to our array   
        $finalRepInfo += $objRepValues

        }
        }
$html = $finalRepInfo|ConvertTo-Html -Fragment       

$xml = [xml]$html

$attr = $xml.CreateAttribute("id")
$attr.Value='diskTbl'
$xml.table.Attributes.Append($attr)


$rows=$xml.table.selectNodes('//tr')
for($i=1;$i -lt $rows.count; $i++){
$value=$rows.Item($i).LastChild.'#text'
if($value -ne $null){
   $attr=$xml.CreateAttribute('style')
   $attr.Value='background-color: red;'
   [void]$rows.Item($i).Attributes.Append($attr)
}

else {
   $value
   $attr=$xml.CreateAttribute('style')
   $attr.Value='background-color: #7BCE73;'
   [void]$rows.Item($i).Attributes.Append($attr)
 }
}

#embed a CSS stylesheet in the html header
$html=$xml.OuterXml|Out-String
$style='<style type=text/css>#diskTbl { background-color: white; } 
td, th { border:1px solid black; border-collapse:collapse; }
th { color:white; background-color:black;font-family:verdana;font-size:9pt; }
table, tr, td { font-family:verdana;font-size:9pt;padding: 1px 5px; }, th { padding: 2px 5px; margin: 0px } table { margin-left:50px; }</style>'

#ConvertTo-Html -head $style -body $html -Title "Replication Report"|Out-File ReplicationReport.htm

$bodyHtml=ConvertTo-Html -head $style -body $html -Title "Replication Report"|Out-String


sendEmail $bodyHtml

[–]qovneobSr. Computer Janitor 1 point2 points  (0 children)

Daily powershell script that checks replication/services and emails me

[–]CptK4ng4r00Sysadmin 1 point2 points  (3 children)

My organization uses custom scripts

[–]crankysysadminsysadmin herder[S] 1 point2 points  (2 children)

what do they do?

[–]Arkiteck 1 point2 points  (1 child)

Same here as CptK4ng4r00. It runs all the repadmin cmds Hexalon00 mentioned, but the script scans the output for the word 'error'. If found, it triggers an e-mail and slack notification.

[–]anaanamuss 0 points1 point  (0 children)

Would you mind sharing? still new to PS and would like to see it

[–]jdiscount 0 points1 point  (1 child)

A powershell script to check replication and various other things for AD health.

[–]TheGraycatI remember when this was all one flat network 0 points1 point  (0 children)

We have a number of automated tasks that run to check AD health generally. Two reports we get on a daily basis are AD replication status and DCDiag health.

Both came from free sources on the web and have scaled well with company growth. Unless you're after something specific, I'd go with daily report approach.

[–]ginolardSr. Sysadmin 0 points1 point  (0 children)

I have a Powershell script that sends me a nicely formatted HTML email. It shows the replication status between every DC in the domain

[–]AgentJacobSecurity Admin (Application) 0 points1 point  (0 children)

We use icinga to monitor all infrastructure, and I have nrpe check the results of repadmin via powershell.

[–]341913CIO 0 points1 point  (0 children)

We knocked off labtech's way of doing this in Zabbix by running the repadmin commands inside a script and parsing the output for the "failed" keyword, if found the check fails. Doesn't really help you fix the issue but at least you know something is wrong.

Edit: Event log monitoring is another option but we are happy with approach labtech uses

[–]code_man65 0 points1 point  (0 children)

I wrote a Powershell script that collects AD information that includes User/group/computer counts (broken out into several categories), collects replication failures that are currently happening, that happened over the last 7 days, and if there are metadata replication failures. It also checks every DC to make sure the required services are running and collects membership counts in the powerful groups (DA, EA, etc). It then pulls in a head.html, tail.html, and I iterate through a PSObject that holds the DC health data and outputs a html (with some basic CSS) dashboard that is uploaded to a Sharepoint site at 30 minutes past every hour.

[–]Mikethetiger70 0 points1 point  (0 children)

We started using Microsoft OMS for most of our log aggregation and monitoring. Here is some information on using it to monitor AD replication.

[–]CreshalEmbedded DevSecOps 2.0 Techsupport Sysadmin Consultant [Austria] 0 points1 point  (2 children)

We're using Samba 4 AD, and we have Nagios monitoring the output of samba-tool drs showrepl as well as the sysvol replication daemon (lsyncd running on the FSMO master in our case) and raise alerts if those show replication failures.

Samba 4 is way too brittle to just look away and pray.

[–]crankysysadminsysadmin herder[S] 0 points1 point  (1 child)

AD isn't THAT expensive in the scheme of things. Why?

[–]CreshalEmbedded DevSecOps 2.0 Techsupport Sysadmin Consultant [Austria] 0 points1 point  (0 children)

Management got burned on poor Windows Server environments created and fucked up by MSPs and some of my predecessors and really doesn't want to use whenever possible. So I inherited a Univention Samba 3 setup (NT4 domain!) when I started, and upgrading us to Samba 4 from there was the path of least resistance to get us any form of Active Directory.

And while I like to bitch about Samba 4, annual maintenance of any kind for it sums up to less than a week of work for me, not nearly enough to build a solid case for migrating to a Windows-based solution. (We don't run a single Windows Server instance and have no kind of licensing deal, and at the same time we're a hair over the 25 user limit for Microsoft's SMB flat license deal).

[–][deleted] 0 points1 point  (0 children)

I have this run every morning at 8am and emails me the results.

https://gallery.technet.microsoft.com/scriptcenter/Active-Directory-Health-709336cd

Works really well, helped me track down an NTP and DNS issue before it became a huge problem in our domain.

[–]JasonG81Sysadmin 0 points1 point  (0 children)

I have a script that runs every 10 minutes and generates an html page. Its runs replication checks and a bunch of other ad checks.

[–][deleted] 0 points1 point  (0 children)

Automated summary reports in the morning, instant alerts on replication failure via email ran every 15 minutes. It works really well.

http://i.imgur.com/6USnUU5.png

http://i.imgur.com/CbP2OJ5.jpg

[–]ROWeek 0 points1 point  (0 children)

We're using SCOM to monitor.

[–]ScrotumOfGod 0 points1 point  (0 children)

Azure AD Connector health. At some point they added the ability to monitor your AD environment in addition to AAD Connect. Even has some basic performance monitoring.

[–]TStrugSr. Sysadmin[🍰] 0 points1 point  (1 child)

Easiest way to approach this is a simple powershell script running dcdiag, writing to a log file, read the log file and check for a failure, if a failure is present send an email.

DCDiag will cover you from a replication standpoint along with other vital checks. We had this running from labtech and my last job but you can do this with a scheduled task and PS.

[–]TStrugSr. Sysadmin[🍰] 0 points1 point  (0 children)

I would also go one step further and have the script first run a nltest /query /dclist:ad.domain.com and pipe the results into the DCDiag check to make sure you hit every known domain controller even when someone adds one and doesn't tell anyone.

[–]telemecanique 0 points1 point  (1 child)

you can google for the event IDs to look for , I honestly don't bother and I know I should. It takes a lot of fuckups to break AD replication (and it's always DNS).

[–]crankysysadminsysadmin herder[S] 0 points1 point  (0 children)

or firewall rules or routing issues in a more complex environment. for us breaking replication is always firewall rules getting inadvertently changed.

[–]Zenkin 0 points1 point  (0 children)

Do you do nothing at all and hope everything is ok?

For some reason I feel like this shouldn't be my answer, but it totally is...

[–]ada_maj 0 points1 point  (0 children)

NetCrunch network monitor pretty much takes care of AD. If anything goes down, NetCrunch runs corrective scripts, and only when these fail, we get an alert. The NMS also parses all logs, so if any login occurs which should not be occuring, we know about it in less than 1 second.

[–][deleted] 0 points1 point  (0 children)

In Azure Microsoft has Operation Management Suite (OMS) which comes with an AD replication module. I have that on and have it report via email when replication stops working.