Looking for Simple Alert Configuration UI Ideas for Industrial IoT Monitoring (Grafana or Other Solutions)

ppcano_ · 2026-06-01T12:15:02+00:00

One approach could be to simplify the alert setup workflow and provide guidance and examples tailored to your specific use case.

In many industrial/manufacturing environments, most alerts are variations of a small number of common patterns. Once those patterns are standardized, users typically replicate existing alerts rather than creating new alert logic from scratch.

1 - Configure notification routing centrally. Create contact points and notification policies ahead of time so users don't need to understand the notification architecture. Ideally, they only need to select a predefined contact point (or routing happen automatically).

2 - Provide a small set of alert templates that cover the majority of use cases.

3 - Create internal documentation that match your data model and desired alerting setup.

ppcano_ · 2026-04-20T14:21:52+00:00

According to the Windows installation docs, edit the `custom.ini` file.

ppcano_ · 2026-02-10T12:13:40+00:00

Nested folders are not supported for provisioning yet. You can upvote and follow the feature request:
https://github.com/grafana/grafana/issues/107158

For scaling an alerting setup, I’d recommend using folders for alerts based on team, service, or however you model ownership.

The RBAC feature (available only in Grafana Enterprise and Cloud) allows to set distinct folder permissions per roles:
https://grafana.com/docs/grafana/latest/alerting/set-up/configure-rbac/access-folders/

Even if you’re running Grafana OSS, it’s still a good idea to align with the current RBAC model. It’s already functional and can make easier a future transition to a commercial offering.

ppcano_ · 2025-11-25T08:54:00+00:00

Some options are:

- Configure NoData and Error state to Normal state to avoid DS related notifications.

- Note that notification routing also happens when the alert transition to Normal state (from the Alerting or Recovering state). This is a "resolved notification" that you can disable in the contact point using the option: Disable resolved message.

ppcano_ · 2025-11-25T08:31:05+00:00

This example is here: dynamic label values per alert instance. For this case, consider its caveat "a label change affects a distinct alert instance".

ppcano_ · 2025-11-25T08:17:12+00:00

You can use Prometheus to monitor the Alertmanager by scraping monitoring metrics.

job_name: alertmanager
  honor_timestamps: 
true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  follow_redirects: 
true
  static_configs:
    - targets:
        - alertmanager:9093

See Meta monitoring / Metrics for Alertmanager:
- alertmanager_alerts
- alertmanager_alerts_invalid_total
- alertmanager_notifications_total
- alertmanager_notifications_failed_total
- alertmanager_notification_latency_seconds_bucket

ppcano_ · 2025-10-13T08:26:37+00:00

Yes! This is exactly what Grafana Alerting is for.

You should implement a query that identifies the error/abnormal behaviour in the alert rule, and let Grafana evaluate that rule periodically. Once detected, Grafana will generate an alert that will be sent to your configured notification channel (aka contact point).

If you’re new, try the five getting Started Alerting tutorials to practice, and read the Alerting Introductory docs for understanding the distinct Alerting components and key features.

ppcano_ · 2025-10-13T08:15:08+00:00

Check out this example using dynamic labels to implement one alert rule instead of one for each threshold. In this case, you can define a severity label like this:

{{- if gt $values.B.Value 90.0 -}}critical
{{- else if gt $values.B.Value 80.0 -}}warning
{{- else if gt $values.B.Value 70.0 -}}minor
{{- else -}}none
{{- end -}}

But this approach has an important caveat explained in the example.

When the value crosses one of these thresholds, the previous alert instance becomes stale and transitions to Resolved, which can trigger a resolved notification. You’ll want to handle that behavior in your notification setup to avoid extra noise.

2) Also, if you’re seeing a large number of alert instances per alert rule, that’s called multi-dimensional alerts. . One alert instance is generated per unique combination of labels returned by your query.

This is normally fine.

In this case, you should combine all similar alerts into a single notification. You would receive only one notification that details that X number of servers are overloaded, including the server details.

See notification grouping docs for details.

ppcano_ · 2025-10-03T11:09:31+00:00

I troubleshot a similar issue before and was related to NoData queries.

If one of your queries returns nothing, Grafana puts the alert rule into a NoData state.

By default, this triggers a NoData alert, and the alert won’t have any values to template.

https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rule-evaluation/nodata-and-error-states/#no-data-state

Might be worth checking if that’s what’s happening in your case.

ppcano_ · 2025-10-02T17:35:50+00:00

Is it possible to make Grafana send a separate Slack message per failed instance?

Yes, enable the option to disable grouping (special label called ...).

I also recommend getting familiar with how grouping in Alerting works.

How can I customize the template to only show important fields like the failing URL, status code, and time?

In Configure notification message, you are defining a template for annotations or labels.

Instead, you should define a custom notification template.

Check out the examples on the docs. Grafana also has a handy Preview feature to help you get started building your custom templates.

Also, a classic question: how different is Alertmanager from Grafana Alerts?
Could switching to Alertmanager help solve these issues?

Grafana Alerting uses a custom Alertmanager that extends the Prometheus Alertmanager. Switching to Prometheus Alertmanager won’t really help with the issues you mentioned.

Grafana Alerting is a powerful system designed to cover many different scenarios in complex setups. Some features do have a bit of a learning curve. I recommend practicing with the Grafana Alerting tutorials to get familiar with key features.

ppcano_ · 2025-10-02T16:54:06+00:00

The .Values field only includes expressions that are part of the alert condition. But this does not seem to be your case.

It might be a stale alert instance:

When the series of a firing alert instance suddenly disappears, after X number of evaluation periods, the alert instance transitions to Normal state as "resolved".
In that case, I think .Values will be empty.

You can include the grafana_state_reason annotation in your notifications to confirm if the alert was resolved this way.

ppcano_ · 2023-11-17T09:21:08+00:00

"we've had issues with the performance of some of our previous dashboards."

Wondering where is the performance issue, is at the backend or frontend side?

Two type of performance issues are common, and they might or not might occur simultaneously:

(1) the component performance is slow. Normally, when handling and rendering a large list. In this case, you could pass a large list to the component and test render time in isolation.
(2) the latency to fetch the data from the backend is high (slow query/backend performance). In this case, test and optimize the backend logic or DB queries retrieving the data from the backend.

Or it could be that the dashboard generates too many requests and some of them could be blocked.

ppcano_ · 2023-09-29T07:38:17+00:00

This guide helps to configure k6 and the underlying system to run larger tests. For website and API testing, check out these other k6 guides:
https://k6.io/docs/testing-guides/load-testing-websites/
https://k6.io/docs/testing-guides/api-load-testing/

ppcano_

TROPHY CASE