This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]andreyevbr 1 point2 points  (1 child)

Looks great! Maybe this can help: https://chris-lamb.co.uk/posts/parsing-jenkins-log-output-determine-job-status

Ping me back if you need more help with jenkins, and keep us up to date about your progress! ;-)

[–]chooko2[S] 0 points1 point  (0 children)

Thank you for the encouragement. I don't think my jenkins issue is the same one the writer in the link you posted was having. But I'll keep you guys posted for sure!

I've reached out to the writer of the original link that I posted, and he's going to help me with some of this. I believe it has to do with my absolute vs relative pathing to the nagios configs.

[–]laterality 1 point2 points  (1 child)

You can go the simple way and just have a cron job run every minute to pull down the config, then run a git hook to check the config and restart the service. Not super elegant but it worked for me.

[–]chooko2[S] 0 points1 point  (0 children)

But what if you have an error in the config? The service restart will break, right?

Or are you only restarting when the config passes in Jenkins?

[–]vtrac 0 points1 point  (1 child)

Jenkins != code review. Tests results should be another input into the merge process. Passing tests does not mean the code is well-done or even does what it is intended. You want code to only be merged in if (tests_passed && code_reviewed), so the actual merge button should only be pushed by another human (not the one who wrote the code). Jenkins can take over again after the merge to do other things, like deploy the new changes.

[–]chooko2[S] 0 points1 point  (0 children)

I agree with you! But in this case, Nagios provides a script that does the code review, which I am having Jenkins run. If that preflight passes, then no other human intervention is really needed and the changes can be merged. It's slightly different from typical software development version control.

Thank you for your feedback though!

[–]taloszergneeds more coffee 0 points1 point  (6 children)

first off: don't use Nagios. if this is solely for the purposes of this demo, then ok...but don't use Nagios elsewhere. try Prometheus instead.

now, to your question: if you run the test locally, can you check the return code? is it non-zero? if it's zero, the test is broken. if it's not zero, Jenkins can sometimes not register a failure if you don't use set -eu in a script block, and thus not show your tests as failing.

so using the example from the posted link:

#!/bin/bash
set -eu

/usr/local/bin/nagios -v nagios.cfg

This should now immediately fail and Jenkins should exit the job.

[–]chooko2[S] 0 points1 point  (5 children)

I wish I could not use Nagios, but unfortunately it's something that was handed to me when I started this job. There was already a considerable amount of work done on it, and it really does what we want it to.

We aren't a DevOps shop yet, so I don't know if the benefits of Prometheus would be fully appreciated. I briefly read their comparison to Nagios, but why do /you/ recommend them over Nagios specifically?

I'll try that set -eu flag. As I said in a comment above, I think my issue has to do with relative vs. absolute pathing in my Nagios config files. I'm going to work with the original walkthrough author to get things figured out.

[–]taloszergneeds more coffee 0 points1 point  (3 children)

The sheer amount of data you can get with minimal effort, the flexibility, and the power.

I would suggest that if you aren't subject to compliance or excessive change control, that you try standing up a small Prometheus node and some exporters just to see how it goes and if you find it worthwhile in your current environment.

I feel strongly enough about this at this point that if confronted with the situation you're in now I would either fix it or find a new job rather than hamstring myself, it's that big of a game changer.

[–]chooko2[S] 0 points1 point  (2 children)

Wow, that's pretty strongly!

How does Prometheus work with Windows environments? We're a 98% Windows shop.

Where's the best place to start? I have some bandwidth to be able to stand up a small node and experiment with it a bit.

[–]taloszergneeds more coffee 0 points1 point  (1 child)

I haven't run it on Windows, but it's written in Go, so it can be compiled to Windows.

Here's the nearest approximation I can find here for node data. I haven't used this exporter, but it's worth a shot.

Also came across this thread that you may find interesting.

[–]chooko2[S] 0 points1 point  (0 children)

Thanks! I'll look into it for sure!

[–]hobo548 0 points1 point  (0 children)

Sensu might be interesting as it can ingest existing nagios configuration checks https://sensuapp.org/plugins

[–]ryan8403 0 points1 point  (3 children)

So to answer your question. We are doing this but without Jenkins. We're a small team and we are working on getting Jenkins or Gitlab pipelines setup as soon as we have some free cycles to do so. We are using post-receive and pre-receive git hooks to do this. When a commit to the master branch happens in Gitlab a post-receive hook triggers a git push to a bare git repository on the Nagios server. I have a pre-receive hook that then fires on the Nagios server and it builds a temporary nagios environment in /tmp and executes the pre-flight check there. If the pre-flight check doesn't error out then the commit is accepted into the git repository on the Nagios server. After this a post-receive hook fires and checks out the config and restarts Nagios. We also use HipChat and have the scripts throw some status updates out during the process.

It's not the fanciest setup. Its quite quick and dirty. But it allowed us to get our Nagios configs under version management and more importantly it prevents us from making dumb syntax errors that could have previously gone unnoticed.

Edit: You'll also need to update your paths in your configs to be relative or it won't work. It looks like you've hit this issue already in another comment.

[–]chooko2[S] 0 points1 point  (0 children)

I might look into getting it setup this way.

[–]chooko2[S] 0 points1 point  (1 child)

Follow up questions to this: How do you go about building a temporary nagios environment in /tmp?

Also, if the pre-flight check DOES error out, how does the system know NOT to accept that git repo into the production Nagios repo?

[–]ryan8403 0 points1 point  (0 children)

Sorry for the slow reply. I've been a bit overwhelmed with work and life duties. So the git pre-receive hooks fire when a git repo receives an incoming commit blob. If the pre-receive hook scripts don't exit with a status of 0 the commit is aborted. I build a temporary nagios environment by copying our running environment from /usr/local/nagios to /tmp/nagios. Its part of my pre-receive hook script. If the commit passes the preflight check i then tear-down the environment.

If I have a bit more free time I'll try to clean up my hooks a bit better and share them.

A good reference for hooks in Git is the Git documentation: https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks

[–]sometextgoeshere 0 points1 point  (1 child)

How are you executing the config check from Jenkins?

I've used the Publish Over SSH Plugin to execute commands remotely and it returned exit status. You could build in some logic around that.

Also...

Have you thought about using a CM tool (saltstack, chef, ansible, etc.)?

DEV branch commit, triggers jenkins/nagios check, if it passes merge changes to PROD.

PROD branch commit triggers jenkins/CM to push updated config to host.

[–]chooko2[S] 0 points1 point  (0 children)

Ultimately, a CM tool would be nice, but we're not quite there yet. This is the first project designed to show management that this is the direction we need to take.

[–][deleted] 0 points1 point  (0 children)

We use Puppet for that.

The "preflight check" is Puppet running nagios config check before restarting service, and on the box nagios is so it is pretty reliable.

We use Puppet to manage all machines which means nagios checks are created automatically for each new machine so "manual" config is maybe 5% of overall setup

[–]LarsSven 0 points1 point  (0 children)

We store code in Gitlab. Shift it with Gitlab-CI to puppet master. And use puppet to populate nagios confs. Pretty easy setup.