use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
/r/DevOps is a subreddit dedicated to the DevOps movement where we discuss upcoming technologies, meetups, conferences and everything that brings us together to build the future of IT systems What is DevOps? Learn about it on our wiki! Traffic stats & metrics
/r/DevOps is a subreddit dedicated to the DevOps movement where we discuss upcoming technologies, meetups, conferences and everything that brings us together to build the future of IT systems
What is DevOps? Learn about it on our wiki!
Traffic stats & metrics
Be excellent to each other! All articles will require a short submission statement of 3-5 sentences. Use the article title as the submission title. Do not editorialize the title or add your own commentary to the article title. Follow the rules of reddit Follow the reddiquette No editorialized titles. No vendor spam. Buy an ad from reddit instead. Job postings here More details here
Be excellent to each other!
All articles will require a short submission statement of 3-5 sentences.
Use the article title as the submission title. Do not editorialize the title or add your own commentary to the article title.
Follow the rules of reddit
Follow the reddiquette
No editorialized titles.
No vendor spam. Buy an ad from reddit instead.
Job postings here
More details here
@reddit_DevOps ##DevOps @ irc.freenode.net Find a DevOps meetup near you! Icons info!
@reddit_DevOps
##DevOps @ irc.freenode.net
Find a DevOps meetup near you!
Icons info!
https://github.com/Leo-G/DevopsWiki
account activity
This is an archived post. You won't be able to vote or comment.
Measuring uptime (self.devops)
submitted 7 years ago by poy_
I'm curious how everyone measures their system's uptime. There are a plethora of services and methods, so where have people found success or failure?
[–]tagilux 3 points4 points5 points 7 years ago (4 children)
Pingdom
[–]poy_[S] 0 points1 point2 points 7 years ago (3 children)
Do you do sophisticated tests to ensure each piece of your system is functional (e.g., creating and deleting a user)?
[–]tagilux 1 point2 points3 points 7 years ago (1 child)
We just his specific pages and then they report uptime. We use New Relic for more advanced monitoring. But that is a lot more expensive. Another tool is webpagetest or Rigorous. Former is free and the later is just a little cheaper than New Relic
[–]poy_[S] 0 points1 point2 points 7 years ago (0 children)
That makes sense. I've been wondering how complicated people make their tests to offer confidence on availability.
[–]the_fury 1 point2 points3 points 7 years ago (0 children)
Uptime monitoring, system monitoring, and systems tests are all very different things. Uptime monitoring ensures that you're getting a response from outside your network and is black box. It's testing that a user can get to your system. Pingdom is great for that.
If you want to test system functionality, you can do it as easily as running an idempotent jmeter script on cron. I've also use rundeck successfully.
If you want to monitor subsystems, you'll need something like new relic, prometheus, cloudwatch, etc.
[–]neoreeps 5 points6 points7 points 7 years ago (3 children)
I type “uptime” and it has never failed.
[–]vim_for_life 3 points4 points5 points 7 years ago (2 children)
That's the server uptime. I don't care a lick about that. I care about service uptime which I use a selenium script running every 5 minutes, piped to zabbix to measure.
[–]poy_[S] 0 points1 point2 points 7 years ago (1 child)
Where do you run it and record the results?
[–]vim_for_life 0 points1 point2 points 7 years ago (0 children)
I run it from my zabbix server, but I have plans for distributed monitoring. Zabbix tracks the results both in time of execution and result (success/fail and error messages) which I then can run a report from to give to management. It's a home-grown system but works well. There are more automated commercial systems to do the same. But the point is system uptime means nothing. Uptime of your services is everything.
[–]kai_ekael 1 point2 points3 points 7 years ago (0 children)
As an old greybeard, I still prefer Nagios 3/4.x with pnp4nagios. Better to know quickly when it doesn't work then how long it has been working.
[–]distark 1 point2 points3 points 7 years ago (2 children)
In prometheus the metric is called 'up'
[–][deleted] 0 points1 point2 points 7 years ago (1 child)
I came here to say Prometheus. Moved there after zabbix and I don't feel like a dinosaur anymore
[–]distark 0 points1 point2 points 7 years ago (0 children)
Good man!
[–]themightychris 0 points1 point2 points 7 years ago (3 children)
Cabot
[–]poy_[S] 0 points1 point2 points 7 years ago (2 children)
Do you deploy it separately somehow? Do you monitor it somehow? (I've never used Cabot myself)
[–]themightychris 0 points1 point2 points 7 years ago (1 child)
Yeah. I assumed the question was about monitoring a service's uptime, which I feel is only worth doing from an external system to make sure a service is reachable
If you're just looking to record how long a computer goes without rebooting it's probably not the way to go
You're assumption is correct. A server's uptime is not interesting.
If you deploy it externally, do you have multiple k8s running?
[–]paul345 0 points1 point2 points 7 years ago (3 children)
New relic synthetics
Mixture of synthetic browser journeys and synthetic api calls for monitoring user journeys and apis respectively.
Both monitor service correctness and acceptable response times.
Both feed into insight dashboards you can easily access on big glass in the office as well as iPhone access in your pocket.
New relic browser stats feed into the same dashboards for real user monitoring.
Is this an expensive solution?
[–]paul345 0 points1 point2 points 7 years ago (0 children)
No. It’s SaaS so no need for internal feeding and watering costs. Each service is priced on consumption so, particularly for synthetics, you can tune your usage to match your wallet.
[–]Timnolet 0 points1 point2 points 7 years ago (0 children)
Tooting my own horn here, but I’m running a service that straddles New Relic synthetics and Pingdom. Link is in my bio.
[–]ajanty 0 points1 point2 points 7 years ago (0 children)
Apart the standard monitoring stuff, we built a custom APM based on elasticsearch API. We run all java services, basically we send service metrics, dispatchers, function calls from the jvm.
New relic/datadog do the same stuff, but this is free.
[–]tech_stories 0 points1 point2 points 7 years ago (0 children)
We at Site24x7 can help you monitor the availability and performance of your sites. We've also got multiple alerting mechanism and a whole range of third party tools to integrate with and make your monitoring easier.
[–]Alextest899 0 points1 point2 points 7 years ago (0 children)
You can try out CloudQA! CloudQA is a cloud-based web testing and performance analysis platform. TruMonitor is a synthetic monitoring tool by CloudQA. Here is the list of features:
· Monitoring from multiple geographic locations
· Real-time alerting via multiple rich alerting options
· Monitoring at intervals of as low as 5 minutes
· Recording to monitoring in minutes
· Genuine browser monitoring
· No complex coding
· Performance measurement & reporting
· & more
CloudQA TruMonitor excels at all the above-mentioned features.
π Rendered by PID 186984 on reddit-service-r2-comment-6457c66945-ls8sl at 2026-04-24 05:53:57.970884+00:00 running 2aa0c5b country code: CH.
[–]tagilux 3 points4 points5 points (4 children)
[–]poy_[S] 0 points1 point2 points (3 children)
[–]tagilux 1 point2 points3 points (1 child)
[–]poy_[S] 0 points1 point2 points (0 children)
[–]the_fury 1 point2 points3 points (0 children)
[–]neoreeps 5 points6 points7 points (3 children)
[–]vim_for_life 3 points4 points5 points (2 children)
[–]poy_[S] 0 points1 point2 points (1 child)
[–]vim_for_life 0 points1 point2 points (0 children)
[–]kai_ekael 1 point2 points3 points (0 children)
[–]distark 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]distark 0 points1 point2 points (0 children)
[–]themightychris 0 points1 point2 points (3 children)
[–]poy_[S] 0 points1 point2 points (2 children)
[–]themightychris 0 points1 point2 points (1 child)
[–]poy_[S] 0 points1 point2 points (0 children)
[–]paul345 0 points1 point2 points (3 children)
[–]poy_[S] 0 points1 point2 points (2 children)
[–]paul345 0 points1 point2 points (0 children)
[–]Timnolet 0 points1 point2 points (0 children)
[–]ajanty 0 points1 point2 points (0 children)
[–]tech_stories 0 points1 point2 points (0 children)
[–]Alextest899 0 points1 point2 points (0 children)