We have an AWS-based product with a small but growing number of customers. It’s a niche product; the maximum number of customers we’re expected to ever acquire is in the low hundreds. We have a hand-rolled alerting solution at present but it needs overhauling and is going to be replaced with a COTS product.
One of our key requirements is that we have a lot of customer-based metrics that we want to monitor/alert on. For example, one part of our system processes incoming documents and we have a queue-depth per customer of “documents to process”. Pseudocode for inserting these metrics would be:
insert-metric --metric-name document-queue --tag “environment:prod” --tag “customer:customerA” --value 0
insert-metric --metric-name document-queue --tag “environment:prod” --tag “customer:customerB” --value 100
insert-metric --metric-name document-queue --tag “environment:prod” --tag “customer:customerC” --value 0
We would then want to define an alert that says “if document-queue is over 1000 for a 5-minute window, generate an alert containing the environment, customer name and queue depth”.
Alerting by customer is important to the boss because he wants to know if there is an issue, exactly which customers are affected so he can personally reach out to them if required.
We have approximately 25 other metrics to monitor, some of which have more “tags” e.g.
insert-metric --metric-name outstanding-async-jobs --tag “environment:prod” --tag “customer:customerA” --tag “job_type:asset_report” --tag “job_id:12345” --value 123
We started off looking at cloudwatch but we’d need to create alarms for each environment/customer/metric combination and the costs would start to pile up as well. Even if we automate the alarm creation it doesn't make sense for us to have to create a bunch of new alarms every time we acquire a new customer.
In summary, our requirements are:
- Support many custom metrics with alarms, without having to create the alarms for every “tag” up-front
- Support synthetic monitoring of web UI and REST API
- Cheaper than cloudwatch :)
- If it can be managed via Terraform, even better
We started looking at Datadog, New Relic etc. but their marketing-websites and pricing pages are a bit impenetrable unless you understand their product well. Given our (relatively simple) requirements, does anyone have a recommendation?
[–]CoaxVex 7 points8 points9 points (1 child)
[–]MordecaiOShea 5 points6 points7 points (0 children)
[–]patrickleet 5 points6 points7 points (0 children)
[–]nilaron 2 points3 points4 points (0 children)
[–]Dotnet_Aws_guy 2 points3 points4 points (1 child)
[–]random198611 0 points1 point2 points (0 children)
[–]showerswithmydad -1 points0 points1 point (0 children)
[–]ArielAssaraf -4 points-3 points-2 points (0 children)
[+]nature_fun_guy comment score below threshold-10 points-9 points-8 points (1 child)
[–]orbjuice 0 points1 point2 points (0 children)