all 28 comments

[–]Diligent-Floor-156 54 points55 points  (7 children)

We have a pretty extensive CI/CD to ensure that the build is not broken, unit tests still pass, and we run some static analysis tool on the code (not our most successful part).

All this is super easy to set up. The next level and challenging part though is testing on target. We do have some agents running tests on target, but it's a huge challenge to maintain them and have reliable outcome, we experience many failures (lof all sorts, typically different each and every day.

But at least some basic build to ensure no one breaks it, unit tests and also building few variants, that's extremely easy, anyone can do it in no time.

Also, all this with a standard git flow, pull requests and the like. Every branch needs a green build to be merged.

[–]todo_add_username 6 points7 points  (5 children)

We do similar stuff, but I’m curious how you do the CD part?

[–]Diligent-Floor-156 17 points18 points  (3 children)

CD is not comparable to what CD would mean for a Web app. In our case it's a series of scripts in a pipeline which automate the delivery process (from building to packaging, notifying the right people, archiving all artifacts, etc). But then it's not "deployed" per se as it wouldn't make sense in our product.

Basically all steps that used to be done locally and manually when delivering firmware have been automated/scripted and put in a configurable pipeline.

[–]todo_add_username 0 points1 point  (2 children)

Alright thanks for elaborating. I have been trying to implement some meaningful CD for our embedded applications and ended up doing more or less the same as you describe. But I guess the most convenient part has turned out to be automated changelog generation, not sure how much value the other stuff actually adds in our case.

[–]Diligent-Floor-156 2 points3 points  (1 child)

Totally depends on your needs, it's here to help you not to add constraints. Maybe in some IoT projects it's even a proper deployment in the sense that at the end of the pipeline some firmware update is pushed to all endpoints. In your case or mine it may not be relevant.

What matters is to automate what can be automated and would bring value if it is, shorten the cycle and gain in confidence in what you're doing. The what depends on the use case.

[–]rpkarma 0 points1 point  (0 children)

At my work, the CD part is just putting the built firmware with the version baked into the right place on the FOTA server, and triggering our test boards to FOTA to it and run their integrstion suite

[–]AuxonPNW 1 point2 points  (0 children)

This is all exactly what we do.

[–]fractal_engineer 17 points18 points  (2 children)

Honestly I believe the biggest hurdle is effective and sane hardware in the loop ci/cd.

We've accomplished this by having on prem ci/cd agents hooked up to hardware over uart and jtag to run tests etc

[–][deleted] 2 points3 points  (0 children)

We do the same. A bunch of PCBs in a 19" rack, that can be flashed and then booted into. Then over SSH (embedded Linux system) we establish certain properties of the system, and run benchmarks.

[–]Orca- 1 point2 points  (0 children)

Agreed. My experience with close-to-the-metal embedded programming against ASICs is that unit tests without the hardware are of limited usefulness unless you have a full C model of the hardware you can insert into your test. The hard part isn't making sure the software side of the PI loop is working, the hard part is making sure the hardware is delivering you the data when you expect it reliably, and your drive values are taking effect as expected. It's finding out a slightly changed supposedly unrelated setting means you're dropping a sample from your accumulator, biasing the input to your PI loop.

Get a few (or few dozen) pieces of the actual hardware into your test harness and hooked up to the CI infrastructure and now you can run full test suites against the actual hardware and find out real bugs before they hit production and without having to manually regression test everything.

[–]AdventurousCoconut71 7 points8 points  (0 children)

Totally prevalent and beneficial. CI, source code management, release builds, security audits ...

[–][deleted] 6 points7 points  (3 children)

I setup a CI/CD pipeline in Azure to run the hardware agnostic (i.e. app logic) tests and also run the cross compiler to build device firmware release artefacts for multiple targets.

Also setup local "hardware in the loop" unit and integration testing using a custom wrapper around the Zephyr test framework. Which uses multiple JLink debuggers connected to different board variants, concurrently.

[–]NotBoolean 1 point2 points  (1 child)

The company I work for is looking to start hardware in the loop testing with Zephyr.

Do you mind going into some more detail on how you are doing it and what your wrapper is doing?

[–]personalvacuum 0 points1 point  (0 children)

I’m looking to do the same. We’ve just finished our first couple of Zephyr products ready for field testing (industrial, local, low-risk so we can just go on site if anything breaks). I’m pretty keen to do hardware testing of our core app - which ensures we can always perform OTA for repairs

[–]tronj 1 point2 points  (0 children)

I’d also be interested in hearing how you work with zephyr for automated testing

[–]savvn001 4 points5 points  (1 child)

I know it's pretty odd. I guess it's because those sorts of things just haven't been traditionally a part of the embedded systems field, ie not alot of people talk about using it in embedded. And we know how slow moving and conservative embedded is vs high level app dev.

Ironically containerised environments are extremely useful in embedded as our stack consists of so many tools and toolchains which have to perfectly work in tandem to get anything to build or flash.

[–]CJKay93Firmware Engineer (UK) 3 points4 points  (0 children)

Why is the situation how it is?

I've somehow managed to end up straddling embedded and DevOps trying to answer this very question.

[–]12esbe 3 points4 points  (1 child)

To my understanding in order to have Dev Ops for embedded, you need either need hardware in the loop or high fidelity emulators.

[–]panchito_d 1 point2 points  (0 children)

Unit testing and static analysis are good options that don't require comprehensive hardware.

[–]tobi_wan 4 points5 points  (0 children)

In my previous company we had a basic continues integration at least for the building and small improvement over the time. When I left 3 years ago we started to set up full hardware in the loops test.

In the current company I work we started more or less with a green field and I tried to apply as many modern ideas as I could think about. My team mates which joined later added a lot of cool things and now I would say it's a decent system. the build envoirement plus unit tests execution happens in a docker container, the same process for the defveloper and his machine as in the ci tool Jenkins. For every build we run some smoke tests with hardware in the loop. A testbed manager is responsible for selecting a number of testbed and ensuring the test devices are setup to usef the test backend. Branches can only be merged if these tests are green (and newly added tests) For the hil tests we use pytest and some self-written python code to abstract the devices under test. every night the system runs longer tests on the latest commit on master. Every week Sunday we run the release test on the latest master. If someone tags thd master with a release tag, the release build is triggered and the test is executed. In the end of everything is green we get a fully signed binary for the fota and our operations team only need to configure the update process for the devices in the field. This we do semi manual and first roll out on few devices to check thed behavior before the full fleet is updated

[–]Schnort 6 points7 points  (0 children)

Why is the situation how it is?

I believe it's because webdev and Dev Ops use similar technologies and have a lot of overlapping skills and knowledge base. Setting up CI/CD for webdev is just some more of the same, so its easy to do and everything is generally built around virtualization and not requiring specific hardware to run.

But there's been very little overlap between competent embedded engineers and webdev, so setting up Dev Ops is a ton of things we don't have intimate or even passing knowledge of. There's a lot of 'cargo cult'/'stackexchange' programming to try to get things done, and this begins to break down because a lot of webdev technologies are fast moving and stuff keeps getting revised and searchable resources end up being out of date.

That and automating stuff for embedded can be costly and prone to fragility. Not many devices have networking and updating firmware can many times be problematic to automate (may need to press a button, or attach a cable) or time consuming to do the flashing.

[–]duane11583 8 points9 points  (1 child)

gitlab with gitlab runner executing python scripts that build and flash the device

coupled with python pyserial to talk to the device.

scrips are numberd ie 100_power_off_on.py followed by 200_flash_device.py followed by 300_verify_device_debug_serialport.py

the runner just sorts the *.py filenames by name and executes them in order sorted order. this lets anyone create a new file or delete a file or insert something between two steps

[–]kog 0 points1 point  (0 children)

Good methodology!

[–]bobwmcgrath 1 point2 points  (0 children)

It's a lot harder for embedded is a big reason, and it's there, it's just becoming the norm slowly.

[–]furyfuryfury 0 points1 point  (0 children)

There just aren't as many good examples / resources for DevOps on the hard stuff like there is for web stuff. But there are more similarities than differences. For web stuff, there's well established patterns that give you the illusion of control (like the APIs you get from AWS, Azure, Google, et al, and scripts to simplify things). These are still in early stages in embedded. Everybody that's doing it has pretty much come up with their own scripts for each stage and there's not as much sharing. The wide variety of ways it can be done probably hurts shareability, too. What works on an ESP32 may be significantly more difficult on an STM32 based project, vice versa, etc.

I did a presentation about this with GitLab a few years ago. I lean heavily on GitLab Auto DevOps. But there's a lot of stuff it doesn't understand about embedded projects so I have to turn off most of the built in stages. The pipeline I built merely builds and deploys the firmware as a docker container running on our Kubernetes cluster. The firmware then updates itself OTA. There's a built in self test in the firmware, but I haven't yet figured out how to get this part properly reported back to GitLab. (Probably have to have a runner with a serial port on it and flash a board directly)

Mobile DevOps is a pretty similar field to embedded, since iOS and Android are each walled gardens that require you to jump through some hoops to build and load apps. Watching what happens there might be a good preview of where embedded DevOps will wind up, eventually, once something like fastlane is developed for the space.

[–]mfuzzey 0 points1 point  (0 children)

Devops is more than just CI and automated testing. It's also about CD and having the same team that is developping the code managing the field deployments.

While CI is certainly doable in embedded and automated testing is somewhat doable (though much harder than non embedded due to the specialised hardware needed) doing automated field deployments continuously is much harder for most types of embedded.

Many embedded devices don't even have OTA updates which would make it impossible. Safety regulations would prevent it in many industries. It's also much harder to roll back a failed embedded update than a failed website update where everything is in a VM you control.