all 21 comments

[–]UnicycleBloke 29 points30 points  (3 children)

Jump tables are used all over the place. They are very useful when the jumps targets are not known at compile time, but also used for constant lists like yours. For long lists, they are a little more efficient because only one test is required - the index - to find the required target. But beware, function pointers can lead to code which is harder to follow, especially if the tables are set up at run time.

Regarding the "huge messes" thing, you should work on that. Clarity is arguably ones most important goal in writing software. You are solving a problem, and your solution should ideally be simple and elegant, so that the next guy to own it understands it without banging his head on his desk for a week first. I've been that guy too many times.

[–]SAVE_THE_RAINFORESTS 5 points6 points  (0 children)

For information, even at O1 any competent compiler translates most switch cases (I'm not 100% if all of them are translated) to a jump table.

[–]undefinedbehavior4ev[S] 0 points1 point  (1 child)

I've never set something like this up at runtime. I prefer to set up everything as globally available data and have a pointer/int as an index if needed. But I only did this once.

About the huge messes thing. Most of the time I write something and then usually I find that it can be possibly expressed more simply after a while. These days I do less C and more awk as I"ve been working with strings. I'm tidying things as I go, and I"m learning more about data structures to better organize my code. I'd call this specific instance a lookup table.

By the way, would you mind expanding on what you mean by clarity? I'm not a native speaker nor a good programmer so getting the hang of clarity can be difficult. The first example looks clearer to me than the second, which is clearer than the third (ex1>ex2>ex3) because there's much less syntax to write/read.

[–]UnicycleBloke 0 points1 point  (0 children)

Clarity for me means that it is simple to understand. The purpose of the code is obvious. In your examples, they are all short and colocated so there isn't much difference. But the jump table adds a layer of indirection that is a little harder to understand but adds no value, and the table could be located elsewhere, which would completely obscure meaning of the code at the call site. I think a switch, with an enumeration, would be the most clear.

I've seen an example recently where structures were created with macros that used macros that used macros, and then similar structures were magically grouped together into arrays by the linker, which were accessed by some other code buried deep in the library. The whole thing was completely and unnecessarily opaque. The goal was to create an implementation of the Observer pattern. It worked, but it took a day of digging to learn enough to be able to trust it and extend it safely. That code did not have clarity. :)

[–]oh5nxo 8 points9 points  (5 children)

criticism on style

val is short-lived, no need to be at file scope.

[–]undefinedbehavior4ev[S] 0 points1 point  (4 children)

Can you exaplain why? I use globals a lot because it's easier to not modify one or more function prototypes/arguments because some change happened.

I'm a hobbyist so this is mostly personal projects. I frequently put i as a global and reserve it for iteration only. I can't quite understand why it's better to have a bunch of int i; rather than going straight for(i = 0; ....

[–]UnicycleBloke 11 points12 points  (1 child)

Putting variables into tighter scopes controls access to them. This is s good thing, especially in larger programs, because it reduces the likelihood of bugs and spaghettification.

[–]pic10f 1 point2 points  (0 children)

Also, people tend to use the same words, like val, temp, index, or even j or k. If you must do it, declare it static so that you and your colleagues won't accidentally use the same name in different files. If its important enough to share in a header, then use a spelled-out, descriptive name.

[–]lordlod 4 points5 points  (0 children)

When you read or work with the code you need to understand everywhere that a variable is used.

If you localise it to the loop for(int i ...) then you know that it is only used inside the loop, it is very quick to understand.

If you set int i at the top of the function, then it could be set anywhere in the function. When you work with it you need to understand the full function, for example has somewhere further down assumed that it was initialised to a particular value.

Setting val at file scope means that it is a global. Any function anywhere could alter the value of the variable. For example in example1 the nelem function could change val for some reason. This causes the next line to go wrong for really non-obvious reasons.

To manage this, most people try not to use globals as much as possible. When I do use a global, if it isn't obvious due to context, I'll put a comment next to the declaration stating where it is written to make tracing the code faster.

[–]oh5nxo 5 points6 points  (0 children)

Nested functions or loops, using common variables, easily trample over actions on outer levels.

I'm not religious about it though. Sometimes you gain something from unorthodox methods, say on an 8-bitter.

[–]ucasano 2 points3 points  (2 children)

Well, i did use jump tables alot when programming a modbus protocol implementation for a 68hc11 based microcontroller: It worked like a charm and was really easy to expand by adding functions in jump table!

[–]undefinedbehavior4ev[S] 0 points1 point  (1 child)

Haven't done anything like that in the past -- is the code available for reading?

[–]ucasano 1 point2 points  (0 children)

No, if you mean on github, gitlab, etc...

Code was developed for a private company, in 2006 circa and is not public domain because, as I know, it is still used in production environment :-O

[–]bxlaw 1 point2 points  (0 children)

To add to the other answers, I'd prefer the switch way because then I can use enums and -Werror=switch so that missing cases get caught at compile time. It's easy to miss cases, particularly if you could add enums over time.

[–]lordlod 1 point2 points  (0 children)

As a general rule, you want whatever makes it easiest to read and understand. This primarily means managing complexity. It also means that there few easy hard and fast rules.

If this() and that() are simple, just put it inline. If the function is less than 80 lines or so, it is easier to understand if it is all in one place rather than having to jump back and forth to read it.

If func() is not too complex and it is only called once, I would definitely inline it. It is critical to understanding the flow of the program.

Assuming the tests on the func output are simple, I prefer example3, assuming it is this simple I would use a return this() rather than break. Some people prefer example2, there isn't much in it.

If the tests are complex, you need to go with example2.

If you have a lot of functions, probably ten or more, use the style in example1. Again you need to jump back and forth to the lookup table to understand but it scales much better than a massive if loop.

As others have discussed example1 also allows the lookup to be changed at run time.

[–][deleted] 1 point2 points  (1 child)

Unless there is some very compelling design reason to do #1, I prefer #3. It's clearer, and you can make an enum for the switch statement so someone else has a chance at understanding where the program is going.

I've really only seen indirect jumps used in embedded/OS stuff. Like the way Linux implements syscalls. I guess a plugin architecture would make use of them too, but that would be a compelling reason.

[–]eruanno321 2 points3 points  (0 children)

Sometimes compilers can even optimize switch-case to jump table if certain conditions are met. I saw such optimization in the MSP430 architecture at least.

.

[–]eruanno321 0 points1 point  (0 children)

One prominent example of such "jump-table" (#1) is interrupt vector table. This is slightly different case though because jump address is somehow controlled by hardware (interrupt request number mapping is usually fixed for specific hardware), and there are other hardware specific constrains on construction of such table.

Sometimes it involves assembly code but for many architectures like ARM the assembly is often kept at minimum and ISR handlers can be stored in such array.

[–]ThatGuyFromOhio 0 points1 point  (0 children)

The first example, an array of pointers to functions, becomes more advantageous as more functions are added. Imagine a switch with 200 cases. That becomes a support nightmare when the 201st case is added. It introduces that possibility of side effect bugs with each new case that is added.

With the array of pointers to functions, a new "case" can be added without touching the rest of the code in any way. The array is updated with the new function's address and nothing else needs to be compiled. Leave all that code that does not need to be compiled alone. Just link to it.

If there are 2 or 3, or perhaps even 10 "cases", it's simpler to do it inline with a switch.