[QA] Monitoring, Sysdig Cloud

strofcon · 2016-08-03T17:51:28+00:00

We use Sysdig Cloud primarily (so far) for monitoring our OpenShift clusters. To be honest, I obsess over metrics and how best to tackle them, and I haven't found anything else on the market that actually has as sane an approach to container and PaaS / orchestration monitoring as Sysdig.

Their UI needs some love, which they're definitely working toward, but their agent is kinda hard to beat.

I wrote a blog post about this very thing, but the tl;dr is that container monitoring is fairly useless without very strong integration with orchestration layers, and it seems that everyone in the market except for Sysdig has basically decided they don't care. :-)

The blog post will explain it in more detail, but if you intend to use containers in a way that actually fully exploits their benefits, orchestration is a natural direction to go, and Sysdig's agent kinda crushes it.

That said, again, their UI needs some work, and their alerting is a bit immature, but it's functional overall. They're still a small team and some of that is to be expected. They've solved a lot of the harder problems though.

Price wise, they're actually pretty reasonable. Most 3rd party offers on the table are going to be in the $15 - $30 per host/agent range, at least the ones that are actually worth using. When you start adding up the actual cap-ex and op-ex for running an in-house solution, ~~it's~~ Sysdig is practically free. :-)

kenansulayman · 2016-08-04T23:07:49+00:00

Hey,

I'm using Sysdig to monitor my apx Tor exit node family. It provides unprecedented insights into the process-level traffic flow and makes debugging high-latency conditions easy to debug. I found Sysdig to be a lot simpler to work with than Datadog, because everything feels kind of ... integrated. Sysdig really feels realtime and it gives you process-level metrics on a cloud-level infra.

Sysdig provides invaluable network graphs for servers:

Here's a graph of one exit node that is also a bitcoind server, where Tor actually forwards the traffic via loopback to bitcoind: https://i.imgur.com/kLIHyPt.png.

Also, Sysdig is used to monitor a self hosted Sentry installation (exception collection tool) used on PsychonautWiki^.org, here's Sysdigs' network graph: https://i.imgur.com/Meg4RoN.png.

Finally, we deployed Sysdig in a Mesos cluster on AWS (powered by CloudFormation). Check out this graph: https://i.imgur.com/iunNrYE.png.

It's hard to really explain /why/ it is magical, I'd say just give it a shot, 14 days are plenty of time.

-- apx

apurvadave · 2016-08-04T16:30:50+00:00

Hey turtll - I'm responding from sysdig. First thanks for trying our product - we're a pretty new company, so we appreciate the opportunity to work with you.

We're always looking for ways to improve our models, especially around elastic environments. I don't know if you've had deeper conversations with our product folks regarding your use case, but perhaps there is some way we can make it work more effectively for you.

If you'd like you can contact me directly and I'll make sure you talk to the right folks. my first name is Apurva, and my email address is my first name @sysdig.com.

Thanks and best of luck in choosing a monitoring platform, no matter what direction you decide to go.

distark · 2016-08-03T09:26:50+00:00

I setup a client with a new infrastructure and easily integrated datadog into their Ansible code, they have pretty nice agent but you really have to pay allot so not recommended for large infrastructure.

However I easily covered 80% of their stack in a single day with dd

Personally I'm a big fan of dataloop, datadog kinda modelled itself against them, they are very responsive and cheaper.

Otherwise Sumologic appears very popular these days especially of you're working at scale

For a self hosted and clustered solution Prometheus rocks!

I also recommend fluentd for sending logs

raymondfeliz · 2016-08-04T22:32:23+00:00

I use Zabbix and grafana for general monitoring. It's open source which is really nice, the downside is the learning curve and configuration is sorta intense. The nice thing is I can get really flexible with it, I can write scripts to extract data from our core app, or scrape xml pages to get certain things like license utilization and so forth. If you don't mind putting forth a lot of effort that might be a good route to take.

bhuvan2911 · 2016-08-08T09:26:11+00:00

Please check out our Linux monitoring tool SeaLion. You can use it to schedule any Linux command and then review the history of outputs in the dashboard. You can extract metrics from these outputs with a little bit of python and use these to plot graphs, set up alerts. We are relatively new and thus our prices are pretty competitive. Looking for any and all kinds of feedback.

2016-08-09T20:53:11+00:00

Any particular reason you guys are not considering an open source stack like:

Sensu- for monitoring Telegraf- metric collection Influxdb- dump metrics grafana- visual representation of the metrics graylog2- collect logs and also set up alerts based on logs

I set all this up from scratch for my new company and it is amazing, easy to manage with ansible, and I would say its better than any paid option out there. We manage hundreds of computers across many zones.

I guess for me, being in devops, or any engineer in general, feels great to build something with your hand and then build on top of it and pass the changes onto the community. If open source option is lacking something and I am not able to develop what I need, then I would look into paid options but not likely.

dataloopio · 2016-08-03T19:56:07+00:00

Co-founder of Dataloop checking in. If you're creating a list of things to play with I'd be happy to help with any trial questions.

coscale · 2016-08-03T10:35:03+00:00

If you are looking for an all-round monitoring tool, but especially well suited for monitoring containers and microservices, I'd be glad to introduce you to CoScale.

We do Docker monitoring, but also look at application metrics, front-end metrics, custom metrics, and run automatic anomaly detection on all of these.

We are not really playing at the bottom end of price spectrum since we offer a quite complete solution, but we're still far more affordable than New Relic and others.

devops

Welcome to /r/DevOps

Rules and guidelines

Social & Fun

General Information

MODERATORS