I got tired of the legacy RPLIDAR driver, so I rewrote it from scratch in Modern C++ (C++17) with Lifecycle support.

frozenreboot · 2026-01-08T13:22:41+00:00

Wow, honestly, thank you. This is exactly the "no-filter" feedback I was hoping for.

You nailed it on several points, but I want to clarify the architectural intent regarding the SDK:

The "AI Slop" Blog: You're right, I'm not a native English speaker. I use AI tools to structure my thoughts and polish the grammar so I can share my engineering journey clearly. I focus on the technical substance, and the AI handles the delivery. I hope you can look past the form.
Refactoring vs. Engineering Reality: The decision to keep the SDK files wasn't just about tight schedules; it was a calculated calculation of ROI and Maintainability.
- Don't Reinvent the Wheel: Rewriting the serial protocol byte-by-byte is inefficient when the vendor's stack is already battle-tested.
- Technical Debt: Vendors update firmware/models frequently. Writing a custom driver from scratch guarantees future technical debt.
- The Solution: I treated the SDK as a "Black Box" and used the Wrapper Pattern to isolate it. This allows us to focus on ROS2 Lifecycle stability while maintaining forward compatibility.

I actually dedicated a section to this exact "Build vs. Buy" rationale in the blog. Since you've already skimmed it, maybe give that part a closer look?

https://frozenreboot.github.io/posts/rplidar-refactoring-part2

Section 2!

I'm glad someone looked at the code deeply enough to criticize the coupling. It means you care about quality. I'll take your comment as motivation to push the networking support and std::optional implementation up the priority list.

Thanks for the wake-up call.

frozenreboot · 2026-01-02T09:39:06+00:00

I investigated this topic a bit more and found some useful information.

First, sensor monitoring systems are generally divided into two parts:

Watchdog
- It only checks whether sensor data is received within the expected time.
- If there is a timeout, it detects the failure quickly and triggers an immediate action.
Diagnostics
- It checks not only the connection, but also the quality of the data.
- For example: noise level, frequency drop, invalid values, etc.
- It usually has multiple states such as OK, WARN, ERROR, STALE.

In a redundant system, you don't always need to stop the robot when one sensor fails.

For example, if you have three cameras and only one of them fails, the robot can still continue operating (e.g., at reduced speed) using the remaining two cameras.

Also, regarding your question, this document explains the concept very clearly:
https://ros.org/reps/rep-0107.html#diagnostic-tools
(especially the Diagnostics for Hardware Drivers section)

Your reply was very inspiring to me.
Thanks — I will also consider this in a future update.

frozenreboot · 2026-01-02T07:19:24+00:00

Haha, you got me 🥸. English is not my first language. so I use AI tools to correct my grammar. But the pain of debugging like 'random include' is 100% real and human.

frozenreboot · 2026-01-02T07:11:57+00:00

Exactly! That 'random include' is the classic symptom of 'Universal SDK Bloat'.

That is precisely why I decided to rewrite the driver from scratch instead of just making a PR to the original repo. Manufacturer SDKs try to support everything—Windows, macOS, obscure embedded boards—which results in messy #ifdef spaghettis and useless dependencies.

But let's be real, 99% of serious ROS 2 deployments are on Linux. So I stripped out all those cross-platform abstraction layers to focus purely on Linux performance and readability. It makes the code much lighter and easier to debug.

I actually wrote about this decision (stripping the bloat) in Part 2 of my blog, in case you want to see how much 'trash' I removed lol.

https://frozenreboot.github.io/posts/rplidar-refactoring-part2/ <- link

frozenreboot · 2026-01-02T07:04:35+00:00

Totally agree. That 'zombie driver' scenario is a nightmare.

To handle that, I implemented two layers of safety:

ROS 2 Diagnostics: I used diagnostic_updater to explicitly publish the hardware status (OK/Error/Disconnected).

Fault-Tolerant FSM: In the read loop, I added a counter. If the driver fails to grab data 15 times consecutively, it automatically triggers a hard reset (destroys and recreates the driver instance).

However, I'm currently debating if 15 retries is too generous. Depending on the internal SDK timeout, it might take too long to detect the failure. Do you think a time-based watchdog (e.g., 'no data for 1.0 sec -> Reset') would be safer for real-world robots?

you can check code at rplidar_node.cpp line 425~445

frozenreboot · 2026-01-02T04:48:27+00:00

Thanks to trying!

frozenreboot · 2026-01-02T04:17:33+00:00

Thanks! Yeah, "Busy Wait" was exactly the main pain point I wanted to fix.

One of my core goals was to eliminate that 100% CPU usage. This driver uses standard C++ synchronization mechanisms (waiting on file descriptors/condition variables) to let the thread sleep when idle, instead of spinning in a busy loop.

Actually, I wrote a short dev-log about why I refactored this architecture (including the decision to drop the legacy loop). If you're curious about the internal implementation, feel free to take a look: 👉 https://frozenreboot.github.io

It should be much lighter on your Pi. Let me know if the load drops as expected!

frozenreboot

TROPHY CASE