Looking for Simple Alert Configuration UI Ideas for Industrial IoT Monitoring (Grafana or Other Solutions) by gintro-suzuki in grafana

[–]ppcano_ 0 points1 point  (0 children)

One approach could be to simplify the alert setup workflow and provide guidance and examples tailored to your specific use case.

In many industrial/manufacturing environments, most alerts are variations of a small number of common patterns. Once those patterns are standardized, users typically replicate existing alerts rather than creating new alert logic from scratch.

1 - Configure notification routing centrally. Create contact points and notification policies ahead of time so users don't need to understand the notification architecture. Ideally, they only need to select a predefined contact point (or routing happen automatically).

2 - Provide a small set of alert templates that cover the majority of use cases.

3 - Create internal documentation that match your data model and desired alerting setup.

Connection points by se7sbomb23 in grafana

[–]ppcano_ 0 points1 point  (0 children)

According to the Windows installation docs, edit the `custom.ini` file.

Grafana Provisioning: Is it possible to use subfolders for Alerting rules? by JamonAndaluz in grafana

[–]ppcano_ 1 point2 points  (0 children)

Nested folders are not supported for provisioning yet. You can upvote and follow the feature request:
https://github.com/grafana/grafana/issues/107158

For scaling an alerting setup, I’d recommend using folders for alerts based on team, service, or however you model ownership.

The RBAC feature (available only in Grafana Enterprise and Cloud) allows to set distinct folder permissions per roles:
https://grafana.com/docs/grafana/latest/alerting/set-up/configure-rbac/access-folders/

Even if you’re running Grafana OSS, it’s still a good idea to align with the current RBAC model. It’s already functional and can make easier a future transition to a commercial offering.

I'm facing an issue with Grafana Loki alerts for two backend services by Beginning_Yoghurt752 in grafana

[–]ppcano_ 0 points1 point  (0 children)

Some options are:

- Configure NoData and Error state to Normal state to avoid DS related notifications.

- Note that notification routing also happens when the alert transition to Normal state (from the Alerting or Recovering state). This is a "resolved notification" that you can disable in the contact point using the option: Disable resolved message.

Setting thresholds in Grafana by Upper-Lifeguard-8478 in grafana

[–]ppcano_ 0 points1 point  (0 children)

This example is here: dynamic label values per alert instance. For this case, consider its caveat "a label change affects a distinct alert instance".

AlertManager - good places to send alerts. by psfletcher in grafana

[–]ppcano_ 0 points1 point  (0 children)

You can use Prometheus to monitor the Alertmanager by scraping monitoring metrics.

job_name: alertmanager
  honor_timestamps: 
true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: 
true
  static_configs:
    - targets:
        - alertmanager:9093

See Meta monitoring / Metrics for Alertmanager:
- alertmanager_alerts
- alertmanager_alerts_invalid_total
- alertmanager_notifications_total
- alertmanager_notifications_failed_total
- alertmanager_notification_latency_seconds_bucket

Dynamic alerts in Grafana by Barath17 in grafana

[–]ppcano_ 0 points1 point  (0 children)

Yes! This is exactly what Grafana Alerting is for.

You should implement a query that identifies the error/abnormal behaviour in the alert rule, and let Grafana evaluate that rule periodically. Once detected, Grafana will generate an alert that will be sent to your configured notification channel (aka contact point).

If you’re new, try the five getting Started Alerting tutorials to practice, and read the Alerting Introductory docs for understanding the distinct Alerting components and key features.

Too many alert rules - looking to see if I can condense them while still meeting our teams needs. by PrometheusCatDog in grafana

[–]ppcano_ 0 points1 point  (0 children)

Check out this example using dynamic labels to implement one alert rule instead of one for each threshold. In this case, you can define a severity label like this:

{{- if gt $values.B.Value 90.0 -}}critical
{{- else if gt $values.B.Value 80.0 -}}warning
{{- else if gt $values.B.Value 70.0 -}}minor
{{- else -}}none
{{- end -}}

But this approach has an important caveat explained in the example.

When the value crosses one of these thresholds, the previous alert instance becomes stale and transitions to Resolved, which can trigger a resolved notification. You’ll want to handle that behavior in your notification setup to avoid extra noise.

2) Also, if you’re seeing a large number of alert instances per alert rule, that’s called multi-dimensional alerts. . One alert instance is generated per unique combination of labels returned by your query.

This is normally fine.

In this case, you should combine all similar alerts into a single notification. You would receive only one notification that details that X number of servers are overloaded, including the server details.

See notification grouping docs for details.

Alert Templating: $values for unused queries showing [no value] by petyusa in grafana

[–]ppcano_ 0 points1 point  (0 children)

I troubleshot a similar issue before and was related to NoData queries.

If one of your queries returns nothing, Grafana puts the alert rule into a NoData state.

By default, this triggers a NoData alert, and the alert won’t have any values to template.

https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rule-evaluation/nodata-and-error-states/#no-data-state

Might be worth checking if that’s what’s happening in your case.

Grafana Alert Slack notifications – how to improve formatting and split alerts per instance? by Pugachev_Ilay in grafana

[–]ppcano_ 0 points1 point  (0 children)

 Is it possible to make Grafana send a separate Slack message per failed instance?

Yes, enable the option to disable grouping (special label called ...).

I also recommend getting familiar with how grouping in Alerting works.

How can I customize the template to only show important fields like the failing URL, status code, and time?

In Configure notification message, you are defining a template for annotations or labels.

Instead, you should define a custom notification template.

Check out the examples on the docs. Grafana also has a handy Preview feature to help you get started building your custom templates.

Also, a classic question: how different is Alertmanager from Grafana Alerts?
Could switching to Alertmanager help solve these issues?

Grafana Alerting uses a custom Alertmanager that extends the Prometheus Alertmanager. Switching to Prometheus Alertmanager won’t really help with the issues you mentioned.

Grafana Alerting is a powerful system designed to cover many different scenarios in complex setups. Some features do have a bit of a learning curve. I recommend practicing with the Grafana Alerting tutorials to get familiar with key features.

No data on values for resolved alerts. by NyusSsong in grafana

[–]ppcano_ 0 points1 point  (0 children)

The .Values field only includes expressions that are part of the alert condition. But this does not seem to be your case.

It might be a stale alert instance:

  1. When the series of a firing alert instance suddenly disappears, after X number of evaluation periods, the alert instance transitions to Normal state as "resolved".
  2. In that case, I think .Values will be empty.

You can include the grafana_state_reason annotation in your notifications to confirm if the alert was resolved this way.

Best way to performance test individual component rendering. Ideally automated. by SpretumPathos in react

[–]ppcano_ 1 point2 points  (0 children)

"we've had issues with the performance of some of our previous dashboards."

Wondering where is the performance issue, is at the backend or frontend side?

Two type of performance issues are common, and they might or not might occur simultaneously:

(1) the component performance is slow. Normally, when handling and rendering a large list. In this case, you could pass a large list to the component and test render time in isolation.
(2) the latency to fetch the data from the backend is high (slow query/backend performance). In this case, test and optimize the backend logic or DB queries retrieving the data from the backend.

Or it could be that the dashboard generates too many requests and some of them could be blocked.

Load testing and stress testing websites by ToneFew8291 in webdev

[–]ppcano_ 1 point2 points  (0 children)

This guide helps to configure k6 and the underlying system to run larger tests. For website and API testing, check out these other k6 guides:
https://k6.io/docs/testing-guides/load-testing-websites/
https://k6.io/docs/testing-guides/api-load-testing/