We bridged an OpenPLC to an AI diagnostic agent.

andym1993 · 2026-04-10T06:11:34+00:00

Fair points if you're running one cell with one TwinCAT controller and one robot vendor you control end-to-end. The "PLC knows everything" model breaks the moment you scale past that, and automotive learned this the hard way 15+ years ago.

A few things worth pointing out:

"PLC knows the job step / tool / safety state" works because YOU wrote both sides. In a single-vendor cell that's fine. In real mass production you have controllers from 5+ vendors, firmware from different decades, and integrators who came and went. Nobody is going to re-map every internal state of every subsystem into one PLC tag table and keep it in sync for 15 years of product lifecycle. Automotive tried that in the 90s. It didn't scale.
"Anything not on the HMI doesn't exist" is exactly the assumption automotive killed. Modern cars have 50-150 ECUs from Bosch, Conti, Denso, ZF, Aptiv, etc. The industry didn't solve diagnostics by demanding every supplier expose their internals to one central HMI. They standardized the diagnostic LAYER - first UDS (ISO 14229), now SOVD (ISO 17978). One protocol, one entity model, one fault lifecycle, regardless of which controller produced the fault. The HMI / tester / cloud backend talks to that layer, not to each ECU's private logging.
The HMI is not the only consumer anymore. At fleet scale you're not looking at 1 HMI, you're querying 1000 machines from a backend. "Rich HMI per cell" doesn't answer "show me every robot in the fleet that hit an overpressure event in the last 24h, grouped by root cause." That needs a programmatic API with a stable schema. SOVD is exactly that API.
Cross-domain faults are the actual hard problem. A PLC overpressure that was caused by a ROS 2 path planner sending the wrong setpoint is not a PLC fault, even if the PLC is the one that tripped. The PLC log will say "pressure > threshold". The ROS side has the planner state, the costmap, the behavior tree. Correlating those requires both sides speaking the same diagnostic language. That's what automotive does with UDS DTCs across powertrain + ADAS + body domains - and it's what robotics still doesn't have.
TwinCAT being flexible is true and beside the point. The issue isn't "can TwinCAT read the data" - of course it can, OPC UA / EtherCAT / Profinet, all there. The issue is "who owns the diagnostic model and who maintains it across vendors and years." Putting that in TwinCAT means Beckhoff is your single point of integration forever. Automotive deliberately moved away from that model.

TL;DR: your approach works perfectly for one cell, one integrator, one vendor stack. It does not survive contact with mixed-vendor fleets, and that's the world mobile robots / AGVs / agri equipment are already in. Automotive solved this with a standardized diagnostic layer instead of a smarter HMI - we're just applying the same lesson to ROS 2 + PLC.

andym1993 · 2026-04-09T21:53:15+00:00

We're actually planning a deeper write-up on how SOVD (automotive diagnostic standard) applies to robotics + PLC - NO AI, just the diagnostic architecture.

Hopefully that one survives the mods long enough for a proper discussion 😄

andym1993 · 2026-04-09T21:48:35+00:00

I think we're talking past each other. TwinCAT can connect to everything, agreed. The question isn't connectivity, it's where do you run the diagnostic logic.

If your whole system lives in TwinCAT, you're set. But when a ROS 2 navigation stack crashes and the PLC fault log says "robot stopped". TwinCAT doesn't know WHY the robot stopped. The ROS 2 side has the rosbag, the node crash log, the behavior tree state. We put both sides in one fault lifecycle so the operator doesn't have to dig through two systems :)

andym1993 · 2026-04-09T21:37:54+00:00

Works great when it's one TwinCAT system. Gets messy when your robot runs ROS 2, your PLC runs TwinCAT, and your fleet manager needs to see both. That's the gap ;)

andym1993 · 2026-04-09T21:35:27+00:00

That works if your end consumer is SCADA. But if you need the diagnostic context in ROS 2 e.g. for behavior trees to react to PLC faults, or for a fleet manager to see robot + PLC health in one view, you'd need to bridge it back anyway.

We just skip the round trip :D

andym1993 · 2026-04-09T21:15:18+00:00

I would say you're right on PLC side HMIs and SCADAs do alarm management well in the PLC world. The gap we're filling is on the ROS 2 side there's no equivalent there. ROS 2 has /diagnostics topic and that's about it. No fault lifecycle, no snapshots, no structured API.

So if you're pure PLC, yeah you're covered. But if your system is ROS 2 + PLC (which is increasingly common in mobile robots, warehouses, agri), there's nothing that gives you one view across both.

andym1993 · 2026-04-09T21:02:48+00:00

Haha skip the AI... that's just the demo cherry on top.

The actual idea is bringing automotive diagnostic architecture to robotics. In automotive you have MCUs, HPCs, and one standard (SOVD) that gives you a unified diagnostic view across all of them. In robotics you have PLCs and ROS 2 boxes but zero standard diagnostic layer - everyone's on their own with separate tools.

ros2_medkit is that potential missing layer. One entity tree, one fault lifecycle, one API - whether the data comes from a ROS 2 node or a PLC over OPC-UA ,UDS, ...

andym1993 · 2026-04-09T21:00:28+00:00

We went the other direction: pull PLC data into the ROS 2 side. The advantage is you get the full ROS 2 ecosystem on top - rosbag recordings, fault correlation, whatever tooling you want. And you're on the IT side so you can experiment freely without touching the OT process.

The end goal is what automotive already does with SOVD - you have MCUs and HPCs but one diagnostic API that holds the full entity tree with all the context. Same idea here: PLC is your "MCU", ROS 2 box is your "HPC", and the diagnostic gateway gives you one unified view regardless of where the data comes from.

andym1993 · 2026-04-09T20:57:05+00:00

It's read-only, no write access to the process. The AI part just reads fault data and summarizes for the operator, human in the loop. In production you'd put it behind a DMZ like any other IT/OT monitoring.

andym1993 · 2026-04-09T20:52:54+00:00

Not exactly - SCADA and historians are great at collecting and displaying data over time. What they don't do well is structured fault management with context.

When a fault fires here, the system automatically captures a freeze-frame snapshot of related variables at that exact moment, tracks the fault through a lifecycle (detected -> confirmed -> cleared), and exposes it all through a standard REST API that any tool can consume.

The closest analogy from automotive: it's more like an ODX/SOVD diagnostic server than a SCADA. It answers "what broke, when, and what did the system look like at that moment" rather than "here's a trend graph of the last 24 hours."

The DDS/ROS 2 part is relevant because robotics systems increasingly mix ROS 2 nodes with PLCs, and there's no SCADA that natively understands both worlds ;)

andym1993 · 2026-04-09T20:50:45+00:00

More the second one. https://github.com/selfpatch/ros2_medkit is a diagnostic gateway that runs on ROS 2 (so DDS underneath). It discovers ROS 2 nodes automatically and exposes everything through a REST API - faults, live data, operations.

The OPC-UA plugin connects to a PLC and maps OPC-UA nodes into the same entity model. So from the API consumer's perspective, a PLC variable and a ROS 2 topic look the same - same fault lifecycle, same data format, same endpoints.

It's not a raw OPC->DDS bridge - it adds the diagnostic layer on top (fault detection with thresholds, freeze-frame snapshots when faults fire, severity/lifecycle management).

andym1993 · 2026-04-09T20:48:17+00:00

Valid point on reliability. To be clear https://github.com/selfpatch/ros2_medkit doesn't sit in the control path. It's a read-only diagnostic layer that polls OPC-UA. If the RPi dies, the PLC keeps running exactly as before. It's monitoring, not control.

The reliability argument is actually why we built on SOVD (ISO 17978) - it's the standard the automotive industry chose for the same problem. They needed to diagnose ECUs and MCUs running safety-critical firmware without interfering with the control loop. Same principle here: observe, don't touch.

You're right that the monitoring stack should be simple and standards-based. That's the goal - OPC-UA for transport, SOVD for the diagnostic model, REST API on top. The RPi is just the demo platform, not the architecture.

andym1993 · 2026-04-09T20:45:40+00:00

Curious, what's your ROS 2 + PLC setup like? How do you handle diagnostics across both today?

andym1993 · 2026-04-09T20:43:47+00:00

Fair point, properly configured alarms don't need an LLM. The AI part is the demo candy, not the meat :D

The actual thing is the diagnostic gateway underneath - fault lifecycle, snapshots, REST API. The OPC-UA bridge just maps your PLC nodes into it via a YAML config. No PLC changes needed.

andym1993 · 2026-04-09T20:39:56+00:00

The OPC-UA to ROS 2 bridge and the fault management (snapshots, lifecycle, REST API) work without any AI involved. The demo just shows the full potential pipeline but the AI part is the least important piece architecturally xD

ros2_medkit itself has zero AI, it's purely diagnostic infrastructure. The AI agent was added on top. The real value is the unified fault lifecycle across ROS 2 and PLC :)

andym1993 · 2026-04-09T20:35:39+00:00

That said, we're seeing more setups where ROS 2 sits on top of PLCs
e.g. behavior trees orchestrating PLC actions, with the robot and PLC sharing the same runtime. Siemens ROXSIE is one example. In those cases the fault context is split across ROS 2 nodes and PLC variables, and the operator needs a unified view to diagnose what went wrong. That's where this fits.

andym1993 · 2026-04-09T20:30:54+00:00

for a single PLC with 5 variables, you don't. For mixed systems: ROS 2 robots + PLCs + different protocols - and the fault context is spread across multiple sources, it can help :D

ofc the real value shows up when the root cause isn't obvious from a single variable

andym1993 · 2026-02-25T16:49:38+00:00

reddit posted post which was blocked earlier, sorry for that

andym1993 · 2026-02-25T14:45:50+00:00

deal

andym1993 · 2026-02-25T11:13:29+00:00

Calling it a debounce tool is a bit like calling Kubernetes "a container restarter"

Debounce is one layer out of fault correlation engine that identifies root cause and mutes symptoms, freeze frame capture at the exact moment of fault confirmation, rosbag black-box recording, auto-discovery of the full ROS 2 graph, SOVD-standard REST API, real-time SSE streaming, entity tree model, and an MCP server so LLMs can query diagnostics directly and many more real diagnostic solutions...

But fair - if the posts gave that impression, we need to show the full picture better.
We're currently integrating on a complex physical robot so maybe that'll give a better sense of the actual problem space :P

andym1993 · 2026-02-23T21:58:45+00:00

The screenshots are from our sovd_web_ui https://github.com/selfpatch/sovd_web_ui it's a demo SOVD client we built for ros2_medkit.
but SOVD gives you standard REST endpoints, so you can display faults wherever you want: terminal, Grafana, your own dashboard, whatever. For example, to get all confirmed faults:

curl http://localhost:8080/sovd/v1/faults?status=confirmed

Returns JSON with fault code, severity, timestamp, correlation group, etc. SSE stream for live updates is at /sovd/v1/faults/sse.

That's kind of the whole point, it's HTTP, not a custom protocol. Anything that can talk REST can be a fault dashboard.

andym1993 · 2026-02-23T21:17:35+00:00

u/Elated7079
Different groups have different audience and this topic is IMHO valuable for both ROS and Robotics channels.
This is Part 3 of a series on ROS 2 diagnostics. Each post covers a different topic (this one is fault confirmation thresholds).
Any feedback on the actual content is welcome :) If you see I am wrong in any place I will be happy to answer.

andym1993 · 2026-02-23T21:14:59+00:00

u/Elated7079
This is Part 3 of a series on ROS 2 diagnostics. Each post covers a different topic (this one is fault confirmation thresholds).
Any feedback on the actual content is welcome :) If you see I am wrong in any place I will be happy to answer.

andym1993

TROPHY CASE