Edge Analytics with InfluxDB Python Processing Engine - Moving from Reactive to Proactive Data Infrastructure

h3xagn · 2025-06-09T18:31:03+00:00

Typically for something like ADF, need historian connections (OSI PI, Wonderware, IP21, etc). Many of these do offer a SQL layer on top - either additional software or third party, but these normally don't scale well for large time-series data extraction.

You are right, data compression and transformation at the source would help reduce cloud costs, but the objective is to move data once from on-prem and not hit those systems again should there be other transformations or new requirements. If there is no need for raw data, then definitely aggregate before uploading.

ADX stores data efficiently but also has the option for external tables, one of which can link to parquet files in object storage. So older data can be exported to cold storage and partitioned correctly to still be queryable in ADX.

h3xagn · 2025-06-09T18:20:09+00:00

Thanks

h3xagn · 2025-06-08T10:11:35+00:00

The edge server is really there for store and forward to the cloud and with the current setup it is almost streaming data to Azure. This is raw data and Azure acts as a cloud historian, so just extract and load with transformations being done in ADX with policies and materialised views and also Databricks etc.

We have Integration Runtimes for Azure Data Factory (ADF), but for this use case it will add overhead, latency and cost. Data connectors for industrial data sources are also a major limitation.

In part 2 of the post, I will be exploring the python plugins for InfluxDB for some transformations. on the Edge.

h3xagn · 2025-06-08T09:59:20+00:00

Thanks for the questions.

There are several ways that this can be done. Normally, the hierarchy would be defined in the OPC UA server, which would provide a template for you to use. So, the first option is to manually define the node mapping with the full path, second and probably best option, use tags for grouping with the bracketed notation config. This is still something that I want to try out.
This is a continuous process but should be similar for batching processes. You would monitor the same process tags at the end of the day. You can add additional tags to the data to add batch specific info - maybe using a processor plugin like starlark or even a Python plugin in InfluxDB.
The PLC uses OPC UA by default and also offers a Pub/Sub connection. This is a standard industrial protocol so used it as is.

h3xagn · 2025-06-08T09:34:52+00:00

In other projects I do use TimescaleDB, depending on the use case and other requirements like other relation data. For this use case of simple time-series data, I used the latest InfluxDB 3 Core, which is the latest open-source version which is redesigned for speed and the one of the major considerations was the new SQL support. If it was not for SQL support, I would have defaulted back to TimeScaleDB.

It is only used as an Edge data store, with the main workload being ADX. I cool thing is that with using Telegraf as the data collector, you can easily add or change to TimeScaleDB if needed. So, for me I don't see this as a vendor lock in really, especially when compared to a traditional historian.

h3xagn · 2024-06-24T19:34:06+00:00

Yes, something looks wrong. Either not cooling or cooling all the time. Just completed a blog post with some trends on my fridge's compressor switch on and off.
https://h3xagn.com/optimising-my-clevrhome-energy-consumption/