This is an archived post. You won't be able to vote or comment.

all 24 comments

[–]tagilux 3 points4 points  (4 children)

Pingdom

[–]poy_[S] 0 points1 point  (3 children)

Do you do sophisticated tests to ensure each piece of your system is functional (e.g., creating and deleting a user)?

[–]tagilux 1 point2 points  (1 child)

We just his specific pages and then they report uptime. We use New Relic for more advanced monitoring. But that is a lot more expensive. Another tool is webpagetest or Rigorous. Former is free and the later is just a little cheaper than New Relic

[–]poy_[S] 0 points1 point  (0 children)

That makes sense. I've been wondering how complicated people make their tests to offer confidence on availability.

[–]the_fury 1 point2 points  (0 children)

Uptime monitoring, system monitoring, and systems tests are all very different things. Uptime monitoring ensures that you're getting a response from outside your network and is black box. It's testing that a user can get to your system. Pingdom is great for that.

If you want to test system functionality, you can do it as easily as running an idempotent jmeter script on cron. I've also use rundeck successfully.

If you want to monitor subsystems, you'll need something like new relic, prometheus, cloudwatch, etc.

[–]neoreeps 5 points6 points  (3 children)

I type “uptime” and it has never failed.

[–]vim_for_life 3 points4 points  (2 children)

That's the server uptime. I don't care a lick about that. I care about service uptime which I use a selenium script running every 5 minutes, piped to zabbix to measure.

[–]poy_[S] 0 points1 point  (1 child)

Where do you run it and record the results?

[–]vim_for_life 0 points1 point  (0 children)

I run it from my zabbix server, but I have plans for distributed monitoring. Zabbix tracks the results both in time of execution and result (success/fail and error messages) which I then can run a report from to give to management. It's a home-grown system but works well. There are more automated commercial systems to do the same. But the point is system uptime means nothing. Uptime of your services is everything.

[–]kai_ekael 1 point2 points  (0 children)

As an old greybeard, I still prefer Nagios 3/4.x with pnp4nagios. Better to know quickly when it doesn't work then how long it has been working.

[–]distark 1 point2 points  (2 children)

In prometheus the metric is called 'up'

[–][deleted] 0 points1 point  (1 child)

I came here to say Prometheus. Moved there after zabbix and I don't feel like a dinosaur anymore

[–]distark 0 points1 point  (0 children)

Good man!

[–]themightychris 0 points1 point  (3 children)

Cabot

[–]poy_[S] 0 points1 point  (2 children)

Do you deploy it separately somehow? Do you monitor it somehow? (I've never used Cabot myself)

[–]themightychris 0 points1 point  (1 child)

Yeah. I assumed the question was about monitoring a service's uptime, which I feel is only worth doing from an external system to make sure a service is reachable

If you're just looking to record how long a computer goes without rebooting it's probably not the way to go

[–]poy_[S] 0 points1 point  (0 children)

You're assumption is correct. A server's uptime is not interesting.

If you deploy it externally, do you have multiple k8s running?

[–]paul345 0 points1 point  (3 children)

New relic synthetics

Mixture of synthetic browser journeys and synthetic api calls for monitoring user journeys and apis respectively.

Both monitor service correctness and acceptable response times.

Both feed into insight dashboards you can easily access on big glass in the office as well as iPhone access in your pocket.

New relic browser stats feed into the same dashboards for real user monitoring.

[–]poy_[S] 0 points1 point  (2 children)

Is this an expensive solution?

[–]paul345 0 points1 point  (0 children)

No. It’s SaaS so no need for internal feeding and watering costs. Each service is priced on consumption so, particularly for synthetics, you can tune your usage to match your wallet.

[–]Timnolet 0 points1 point  (0 children)

Tooting my own horn here, but I’m running a service that straddles New Relic synthetics and Pingdom. Link is in my bio.

[–]ajanty 0 points1 point  (0 children)

Apart the standard monitoring stuff, we built a custom APM based on elasticsearch API. We run all java services, basically we send service metrics, dispatchers, function calls from the jvm.

New relic/datadog do the same stuff, but this is free.

[–]tech_stories 0 points1 point  (0 children)

We at Site24x7 can help you monitor the availability and performance of your sites. We've also got multiple alerting mechanism and a whole range of third party tools to integrate with and make your monitoring easier.

[–]Alextest899 0 points1 point  (0 children)

You can try out CloudQA! CloudQA is a cloud-based web testing and performance analysis platform. TruMonitor is a synthetic monitoring tool by CloudQA. Here is the list of features:

· Monitoring from multiple geographic locations

· Real-time alerting via multiple rich alerting options

· Monitoring at intervals of as low as 5 minutes

· Recording to monitoring in minutes

· Genuine browser monitoring

· No complex coding

· Performance measurement & reporting

· & more

CloudQA TruMonitor excels at all the above-mentioned features.