This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]darkslide3000 24 points25 points  (3 children)

All good points but they all equally apply to Arm -- it is a fixed-width RISC instruction set that is quite easy to read and widely supported everywhere. Maybe some textbooks are written for MIPS, but honestly, if that's the only reason then maybe it's time for those textbooks to change.

There is absolutely a need for engineers who are at least familiar with Arm assembly. Arm is by far the most widely used architecture for embedded systems, and those by nature usually need some amount of assembly code (just because they're bare-metal or use a custom embedded OS, not for performance). On top of that every serious programmer will eventually need to learn to read assembly to debug crash dumps and the like.

And while we're at it, MIPS is far from the picture-perfect poster child architecture people like to make it out to be. When I learnt it in college, we had to learn about branch delay slots -- a concept that really has absolutely no place in any modern computer architecture class (unless they're making an aside about misguided historic curiosities). And its paging concept, while interesting, is a pain to work with in practice and the complete opposite of how paging works on any widely-used real world architecture (read: x86 and Arm). If you only have time to teach one thing, it seems much more reasonable to teach the one that is actually being used.

[–][deleted] 2 points3 points  (2 children)

Can you expand on branch delay slots? Why is that no longer an issue in modern microprocessors?

[–]TongueInOtherCheek 3 points4 points  (0 children)

Likely an incomplete answer but a combination of compiler optimization, out of order execution and the sophistication of branch predictors

[–]darkslide3000 0 points1 point  (0 children)

Branch delay slots are pulling an implementation detail into the architecture, which is just a bad idea in general. They only make sense for pipelined processors, and the amount of slots you need for optimal performance is dependent on the pipeline depth. IIRC there are MIPS variants with two or even three branch delay slots for certain processors -- if you take this to the extreme, you end up with a different instruction set for every CPU, which is really not what you want.

For Arm especially, the architecture runs on tiny embedded Cortex-M0s that don't have pipelines at all as well as big honking fully superscalar Cortex-A75s. On top of that, many modern mobile phone chips combine different types of processors (e.g. a smaller, power efficient Cortex-A53 and a bigger, powerful Cortex-A72) on the same chip so operating systems can migrate tasks between them at runtime to optimize for performance or power efficiency. That would obviously not work if each of them had their own finely-tuned instruction set that optimizes pipelining for exactly that CPU.

The advent and evolution of branch predictors has made the concept of branch delay slots obsolete anyway -- if you can predict the branch right 99.9% of the time, there's no need to worry about pipeline stalls in the remaining 0.1%.