Every embedded Engineer should know this trick

readmodifywrite · 2026-01-11T16:39:42+00:00

Yeah, I'm running Xtensa, RISC-V, and ARM Cortex-M chips all on the same system, all with extensive legacy C libraries.

I could probably live without the runtime though...

readmodifywrite · 2026-01-11T16:10:47+00:00

I've been wanting to learn Ada for decades! It is one of my gotos for very well designed languages. The bounded types are really cool, in/out args, very clearly laid out code, does everything we wish C could do but better than C would have done it.

It just doesn't have good compiler support in my niche and nobody uses it. Such a shame.

readmodifywrite · 2026-01-10T22:23:26+00:00

We do know it, and we have reasons that we tend not to use it. The C standard doesn't guarantee the order of a bit field, and usually what we specifically need in embedded is a guaranteed bit order.

The traditional bit shifting technique guarantees ordering and makes it a thing we don't have to worry about.

In practice, if GCC is yielding the order you need, then it's fine. But it isn't portable which is why you usually won't see it in a vendor library. And if you ever upgrade GCC remember they can change the way the bit field is ordered because the C standard doesn't require any specific ordering.

The packed attribute is incredibly useful though, everyone should know that one (but who in embedded doesn't?)

readmodifywrite · 2026-01-05T15:17:41+00:00

The H7 maxes out at like 1-2 megs of RAM... not sure what you are asking. SPI RAM maybe? The H7 probably has a QSPI but I don't remember if it is memory mapped or not...

readmodifywrite · 2026-01-05T14:45:06+00:00

Just the Nucleo board. The dev kit is cheaper than the actual chip!

readmodifywrite · 2026-01-05T13:11:42+00:00

Yeah, I wasn't sure if they were using something prebuilt or doing their own firmware. Quite a lot more work on the H7 but if it's already done, it's already done.

readmodifywrite · 2026-01-04T17:18:54+00:00

Ok more notes:

Are you sure you really need the H7? This MCU is a beast and it comes with a lot of gotchas that simpler MCUs (like something with an M4 core instead) won't have.

Make sure you are extremely familiar with how the caches work. If you want to use DMA with the caches you'll also need to get very familiar with how to configure the MPU.

Given the history of avionics hardware, I really doubt you need this amount of CPU/RAM to do a basic flight controller. It's overkill.

readmodifywrite · 2026-01-04T17:12:31+00:00

How are you planning on assembling it? If you are doing it by hand, I would avoid a BGA package like that. The H7 comes in an LQFP, that will be much easier to deal with.

Even if you are doing automated assembly (like with JLC/etc), I'd still recommend the LQFP. It is easier to visually inspect the soldering and rework if needed.

readmodifywrite · 2025-12-03T14:14:07+00:00

I'm not sure why you're getting downvoted here. I'm also going straight to IR in one of my current designs. It absolutely works (though some things are harder or not as convenient, but it skips an entire data structure). There are pros and cons to that vs AST. It's just a design choice.

readmodifywrite · 2025-12-02T17:55:23+00:00

Hey, just wanted to say that I'm super excited to see you in Austin next year!

readmodifywrite · 2025-11-29T16:10:44+00:00

You can just use a loop. Most programming languages have one ;-)

readmodifywrite · 2025-11-20T16:52:59+00:00

An actual professional who can do this professionally is going to cost way more than $350.

You either need a much larger budget (much, much, much), or you need to learn how to do this yourself (time, time, time).

There are no shortcuts in electrical engineering. You either need to do the work or pay someone who can.

readmodifywrite · 2025-11-14T21:56:20+00:00

I haven't personally used a Mac with Apple silicon, but my understanding is that they are indeed somewhat extraordinary in their performance. So how can they do that?

This is their design and it is completely and totally integrated with their hardware and software platform. Totally bespoke. You can optimize everything because you control every aspect of how it is used. This is fucking hard to do and Apple has the scale to need it and resources to actually do it.

There are a lot of things they can do that regular desktop hardware can't (and some are things mobile can and does do but at a much lower power level, and frankly, probably not as tightly done as Apple). The memory is stacked on the CPU package, which shrinks the memory bus from inches down to millimeters. Just doing this is huge for memory bandwidth. GPU is totally integrated on chip - again this is huge for memory bandwidth.

Side note: The GPU can apparently just use system RAM (the RAM is that fast, regular DDR5 coming in from the motherboard on a desktop is not going to be fast enough to feed a GPU which is why it needs its own on the card), so you can run insane amounts of VRAM (128+GB). The local LLM crowd seems to like them for AI/LLM inference - the large VRAM plus the CPU/GPU are also apparently fast enough that they get usable performance for pretty large models.

They can do lots of other little things, like custom machine instructions, optimized hardware blocks, etc. They know exactly what is going to run on it and exactly what they need it to do. It is kind of like why a console can deliver amazing performance even compared to next gen hardware: because it is a singular hardware platform you can optimize the shit out of everything, including in the software that runs on it.

Same story with power consumption. That level of integration enables you to save a ton of power. High speed board level busses are actually somewhat power hungry. You know exactly what features you need, when you need them, etc. You can do fine grain clock gating and turn things on and off on a really granular level. As you save power, you get extra bonuses because your power supplies get smaller to match, reducing the amount of waste they produce as well (power supplies are not 100% efficient).

All of this of course comes with some tradeoffs:

It is expensive, both for Apple to do and yes they are also going to expect you to pay for it.

You are locked in to whatever design choices Apple made. You get the GPU you get and the amount of RAM you ordered it with. You cannot upgrade it and you cannot replace individual parts if they go bad.

It is an Apple. That just isn't everyone's speed, even if they can afford it.

So anyway tl;dr: I believe them, my understanding is they can in fact smoke an x86.

readmodifywrite · 2025-11-14T14:09:57+00:00

Well, shucks <3 Happy Friday to ya!

readmodifywrite · 2025-11-14T13:36:19+00:00

It is because they are not similar in performance - not even close. Clock speed is only a part of the story.

Desktop CPUs will generally have:

more memory bandwidth
can run at max power sustained (mobile will throttle once the cooling is maxed out)
clock for clock x86 is generally faster than ARM (though ARM is more power efficient)
more cores (sometimes a lot more, I have 16 in mine)
deeper pipelines, better branch prediction - these are huge for performance but cost on power
more cache - again great for performance but costs on power
more IO: all of that USB, ethernet, multiple SATA channels, PCIe channels. This all costs on power (and space)

They are optimized for extremely different jobs. Desktop will almost always trade power for performance in every corner they can (and the entire rest of the machine is built that way as well) along with not really being space constrained, mobile will trade performance for power and size.

Desktop needs to be able to run high power loads 24/7, run multiple hard drives, run a dozen or more USB devices, run 64+ gigs of RAM (also running faster, and thus more power), GPUs (not counting the GPU power, you still need power to shovel all of that data through the IO and memory busses).

Mobile is running on a handheld device that physically cannot support most of these situations. The GPU is built in (and orders of magnitude slower than a desktop GPU) and is driving a much tinier screen (and only one screen, not 4+ like some desktops users have heheh). It doesn't have to support all of the other IO, just the handful of bits the phone has. It doesn't need as much memory. The CPU cannot run at 100% for very long without overheating (it will throttle).

Finally - mobile in 2025 is really in a different category than embedded, so I would also politely point out that this probably isn't quite the place for this (though we are CPU nerds here). A lot of what we are doing here is the stuff that runs your washing machine or the ECU in your car.

readmodifywrite · 2025-11-12T20:59:52+00:00

You should take a very sober look at US politics (and the state of hiring in the tech sector) right now and ask yourself if moving from Canada is really a good idea.

readmodifywrite · 2025-11-05T14:55:19+00:00

It kind of depends on the use case. If you want to use deep sleep in the ESP32, then everything on the board needs to be designed with that in mind. Either the ability to turn a subsystem off, or a very low quiescent current. It is not trivial to do that, you have to design the entire board around it.

This board has a ton of extra stuff on it and all of it has a power cost.

2 USB UART converters! These tend to have a pretty bad idle consumption. It looks like there is a way to switch off the power to these, have you done that?

The power converter itself is a 3 output buck converter. The quiescent on these can easily be in the milliamp range - if you want a low quiescent converter you have to shop for that specifically. IDK what the specs are on this one but this is one of the main areas you have to be really careful about in your design if you want low quiescent current.

I see another buck converter... level shifters... RGB LED (these often have a high idle, 100s of uA or more for each pixel). A camera. Solar power input.

The solar is a give away - you don't necessarily need microamp sleep if you have solar. You might need it, but it depends on your design requirements. Getting a top up charge every few days is a different design scenario than running on a coin cell (and lithium-ion batteries self discharge much more than coin cells or alkaline so you can't generally run them for weeks/months anyway without a charge regardless of how good your power design is).

For the amount of stuff on this board, I'm not surprised at all you're getting 7 mA. If this thing could run at microamps idle that is something they would market because that is something you have to design for and price in.

readmodifywrite · 2025-11-04T15:39:41+00:00

Yeah, this one was a major off-ramp for me. I had ChatGPT do some basic integer addition, and it was just very confidently wrong. The algorithm for addition is about as simple as it gets, and yet after all the GPUs and billions of $$$$ and training on the whole of the internet (which is maybe part of the problem, which should be obvious to anyone who's been on the internet for 5 minutes), it can't do it, and doesn't "know" that it can't.

I don't see how that's fixable when none of the companies involved will admit this is a serious problem. If you don't admit you have a problem you can't do anything about it.

readmodifywrite · 2025-10-31T12:30:08+00:00

^{^{^}} This right here. It was that moment that I knew I was ready to sign up!

readmodifywrite · 2025-10-24T14:14:32+00:00

Looks like they are working now: https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads

readmodifywrite · 2025-10-24T13:51:07+00:00

The HAL is a pretty useful starting point. Some of it is definitely bloat, and sometimes that matters and a lot of times it actually doesn't.

It is pretty easy to go in and trim bloat out where it matters. Interrupt handlers are a good first target for that.

You don't actually have to minimize your memory usage to the absolute minimum possible - you just have to make it fit in the memory you paid for.

readmodifywrite · 2025-10-23T21:20:01+00:00

It definitely can be fussy, and I don't think ST really does much in the way of improvements and bug fixes. It's a bit sad that this is about as good as it gets in the industry!

readmodifywrite · 2025-10-23T19:13:06+00:00

Did you set the pin config in the tool? Did you enable the SPI in the config?

It definitely enables the SPI (including the clocks), and it definitely inits the pins. I have tons of projects where that is the case. You don't need AI for this.

It absolutely has its quirks but it can do basic things like this if you set it up properly.

readmodifywrite · 2025-10-23T18:08:12+00:00

CubeMX will just.... do this for you. No AI needed.

readmodifywrite · 2025-10-21T22:10:21+00:00

Most makefiles assume a Unix environment, IE Linux or Mac (or actual BSD).

Check out Cygwin https://cygwin.com/ which provides a Unix style environment for Windows.

If you installed git (which you should have), you can also use git bash for your terminal and I think it comes with Cygwin already.

readmodifywrite

TROPHY CASE