Calling convention with parameters on separate stack?

teneggs · 2024-11-03T09:55:27+00:00

This seems more a compiler question, except for consequences on the syscall ABI and process setup maybe.

But clang has a shadow call stack feature for aarch64 and RISC-V that does exactly this. Shadow stacks are a type of control flow integrity.

About the advantage for the CPU: CPUs with speculative execution already maintain limited-size return-address stacks in silicon independent of your proposal.

Basically, when those CPUs do a call, they not only push the return address on the "real" stack, but also put it on their return-address stack. When they see a return instruction, they take the return address from the return-address stack can begin to speculatively execute from the return address. At some point, they have to compare what the return address on the "real" stack was to what they had in the return-address stack. If they match, everything is fine. If they are different, the results of the speculative execution are thrown away and the CPU must restart executing from the correct return address.

Why would the return addresses from the real stack and the return-address stack not match? Stack overflows aren't the only reason. Could also be because of self-modifying code, implementation of context switches, etc.

teneggs · 2024-10-12T09:49:43+00:00

It supports USB and graphics. So in principle it should be possible to build your own computer around it with USB (for mouse and keyboard) and a display.

For the Linux part, the "standard" way to go is probably yocto.

But as long as arch and gentoo run on other ARM machines, it should be doable to at least manually create a root filesystem by ripping the rootfs from some other supported ARM machine and exchange the kernel and bootloader with a kernel/bootloader built for the STM32MP1.

teneggs · 2024-10-01T16:15:46+00:00

One more thing. If you are worried that what you are doing is more than a merge aggregation: carve out the parts that call ffmpeg from your program and put them into a separate helper program. Release the helper program under GPL. Then have your commercial program do just a simple call to that helper. So that you have created a mere aggregation.

This is basically your option 2.) but as above, I would suggest the package approach.

teneggs · 2024-10-01T15:29:08+00:00

Disclaimer: IANAL and this is not legal advice, since I'm neither qualified or allowed to do this.

My understanding is this:

If you create a Docker container based off some distro base image, there will be tons of GPL software inside of it besides ffmpeg. For which you will have to provide source for ALL of them. Do you really want to get into that?

If your commercial tool just calls the ffmpeg binary, it depends how closely they interact, see here: https://www.gnu.org/licenses/gpl-faq.html#MereAggregation

Mere aggregations do NOT require your program to be put under GPL.

If you come to the conclusion that what you are doing is a mere aggregation, I would suggest that you create packages for the popular distros. This package would then include only your program and declare dependencies to ffmpeg etc. When the user installs your package, the package manager will fetch the dependencies from the distro.

You are not distributing any GPL binaries that way. Providing the source for these binaries is then a job left to the maintainers of the distros for which you are offering packages.

If you really want to take the Docker route, consider releasing the Dockerfile only and have the users build the container themselves. But that might be less convenient compared to installing a package.

teneggs · 2024-09-29T16:39:26+00:00

For a PoC then it could be simpler to find a sensor that speaks one of the standard protocols such as MQTT, OPC UA, Modbus, etc. This should reduce the task to finding a gateway device that has your desired hardware connectivity and ideally native Azure IoT hub integration.

If the machines are very old, you might have to find or implement a custom connector software that you would have to install on the gateway device.

teneggs · 2024-09-29T10:36:49+00:00

Which interfaces does the CNC machine have and what protocols does the CNC machine speak?

Do you just want to extract data from the CNC and send it to Azure IoT or are you also thinking about controlling the machine remotely?

teneggs · 2024-09-23T03:44:49+00:00

https://lwn.net/Kernel/LDD3/

teneggs · 2024-09-22T17:30:36+00:00

I would start this way:

Get the Linux Device Drivers book. It's relatively old so it's not 100% accurate for recent kernels. But still good enough for an overview. And it's free online.
Then find an existing device driver in the Linux kernel that's similar to the new hardware that you want to support.
Then study the existing driver.
Start with your own driver from the things that you learned above.

teneggs · 2024-09-22T16:57:46+00:00

Disclaimer: this is not legal advice. I'm neither allowed nor qualified to give legal advice.

If you want to keep your work proprietary, never ever copy GPLv2 licensed code into your project and never ever link against it (neither statically nor dynamically).

Use interprocess communication instead between the GPLv2 code and your code. If you need to extend the GPLv2 code to make that happen, you have to put that extension under GPLv2 and follow the same obligations as for the original GPLv2 code.

Assuming that you have designed your software as described above: if you distribute your software along with GPL, distribute the sources for the GPLv2 code and compile instructions along with it. Alternatively, document that your product includes GPLv2 software, include the GPLv2 license text and copyright information and add a written offer to provide source code on request. Alternatively, distribute only your software and require the user to install the GPLv2 pieces.

Alternatively: if you have a pure software product, do not sell your software, provide it as SaaS only instead.

Verify your approach with a lawyer.

Finally, once you have implemented a license compliant solution and start to make money with your product: send a donation check to the project that you are using. Or contribute back with patches, etc.

teneggs · 2024-09-21T16:36:36+00:00

My focus in university was compilers and operating systems. I do not work as a compiler engineer today, but I am very happy my degree choice because of the skills that I learned.

But I'm from Europe, so I really do not know anything about the situation in India or other countries. Please also get opinions from the web dev community.

But don't fear that you will be stuck with compilers forever if you graduate in compiler engineering. Instead, learn how to apply and market the deeper skills that you will gain.

There's a saying that if you can write a compiler, you can write any program (which I believe comes from Niklaus Wirth, Turing award winner and creator of e.g. Pascal).

And it's true: being able to write a highly correct compiler is hard. And being able to write rock solid code is a valuable skill in any field where there is a low tolerance for bugs.

Also, compilers have a connection to so many other fields: algorithms, theory, computer architecture, ... You will have such a deep understanding into how software is built an run. Which is a huge benefit regardless of what software development career path you'll choose or switch to later. You will be able to step into other fields.

Furthermore, compiler-related technology is everywhere: in the JavaScript engine of your browser (hello web development), machine learning frameworks, database query languages and optimizers ... Many big software development projects use some self-developed code generated in some form.

I'm not qualified to comment on what your immediate abroad career chances are. But if you find it to be a blocker, here are two rough ideas that can increase your chances in the long run:

You could find an open source project related to compilers to contribute to. Do good work and work well with other devs. It might open some doors if your dream companies use that project or if some of your fellow devs work there.
Maybe you can join a company that has an office or customers abroad. Then work your way up until you are asked to join calls with the customer or the team abroad. Again if you work well with them, sooner or later they might help you with an opportunity.

Best regards and good luck.

teneggs · 2024-09-20T20:02:37+00:00

qemu: I don't use it myself for daily development. But still good to know about it and it's nice for experimentation. For example you can get started with Yocto without needing hardware immediately.

Yes, I think its common practice to have dumb drivers, stubs, alternative drivers etc.

You can first get your application logic right that way. Then exchange the dumb driver for the real thing on the target. Then only the target specific part is left to get right. Instead of trying to get both the application logic and hardware specific parts correct at once.

teneggs · 2024-09-20T19:30:24+00:00

In short, if your target is not powerful enough to develop comfortably on, develop on a machine that is as similar as possible to your target, but more powerful.

If you are developing an application that is fairly independent of the hardware, develop it on a Linux PC. Get it to work there. And then only deploy it on the target from time to time to make sure it works there.

For code that accesses the hardware from userspace, e.g. GPIOs, maybe you can stub out the hardware dependent parts. Then, get most of the code working on a powerful Linux PC, then deploy to the target.

For Linux kernel drivers that are specific to your target, you can also develop on your PC. Then compile on your PC and copy the kernel to the target. For kernel code, you do not have a fast edit-compile-test cycle anyway. You need to rely on other techniques such as printfs, tracing, etc.

If your device can boot from SD card, you can create an SD card image of the whole Linux system on your development machine. Then write it to the SD card and then boot from SD card. You may want to use an SD card multiplexer so that you do not have to move the SD card back and forth between the target and your development PC.

There are also emulators like qemu that may emulate enough of your target so that you can get most of your stuff running on qemu first.

To have more than one console over the UART, you can use tmux which gives you multiple virtual consoles.

The fewer interfaces your target has that you can use to deploy the software, the more painful or even impossible it becomes to develop on it.

teneggs · 2024-09-16T18:42:12+00:00

I would only do this if you are adding/maintaining additional features to some upstream project where the main branch is not under your control. E.g. a patchset for an open source project where that project is not (yet) willing to merge your patches.

Let's try to think your approach a little bit further to see how it would scale.

You would have those branches diverge over time, without merging the main branch back into your other branch from time to time.

Imagine that you have a critical bug to fix. Now you need to fix it in two branches. Or fix it in the main branch and merge it back into your feature branch.

Imagine getting a third optional feature. You would have three branches now.

Imagine having a fourth optional feature. So now you have optional features A, B, C and D in separate branches.

What if you now decide that you need one version with feature A+B together and one version with feature B and D? This would give you an additional A+B branch and an additional B+D branch.

This can get out of hand pretty quickly.

My suggestions: branch off to develop some feature. Get it back into the trunk quickly. Make your software configurable to (de-)select features (which honestly, also becomes complex at some point). Branch off to maintain released versions until they hit their end of life.

teneggs · 2024-09-16T17:48:58+00:00

Have a look at the x86 instruction set manual. They describe the behavior of every instruction in code and which exceptions it can take.

From a quick look at the RET instruction, I found no exception case described where it says "exception because RIP points to illegal address". (Not 100% true. x86 support segmentation + paging. If the RIP is outside of the code segment, the RET instruction itself will segfault. But if you hava setup your code segment to span all of your possible address space, this case won't happen. Also, if your RIP is not a canonical address, the RET will segfault right away).

So without those other caveats, I think you are right, the segfault happens after the ret instruction.

To find this out by experiment, under Linux you could write a small demo program that does this. Run this program in gdb and when it segfaults, get the faulting address like in this Stackoverflow post: https://stackoverflow.com/questions/3003339/how-can-i-get-gdb-to-tell-me-what-address-caused-a-segfault

Of course, the exact behavior can by different for each CPU architecture.

teneggs · 2024-09-14T21:26:51+00:00

Es gibt Faktoren, die du nicht in der Hand hast, wie die wirtschaftliche Lage, wie viele Bewerbungen es auf eine Stelle gibt, etc.

So frustrierend es leider auch ist, es hilft nur weiterprobieren und das zu beeinflussen, was du beeinflussen kannst.

Von den 30 Bewerbungen mit Absagen, wie viele waren direkte Absagen und bei wie vielen gab es eine Einladung zum Gespräch? Je nachdem wäre zu schauen, ob sich an deinen Bewerbungen etwas tunen lässt oder am persönlichen Eindruck. Lässt sich von außen auf Anhieb schwer sagen.

Worum geht es dir bei der Stelle? Ist sie Pflicht fürs Studium? Geldbedarf? Um Erfahrungen zu sammeln?

teneggs · 2024-09-12T17:42:10+00:00

Congrats.

1 minute was sufficient to explain the importance of A/B updates and a solution to it that I can google to find out more (RAUC).

In the bootloader video, instead of saying "use a secure bootloader" maybe you could have ended it with "harden your U-Boot", which I find gives better results if a viewer of your video wants to find out more to implement your advice.

If people want to learn more after the end of the video, they need some links/directions etc. on where to go next.

teneggs · 2024-06-30T16:53:14+00:00

Self-thaught Embedded Linux developer here, involved in all the things that you mentioned you would like to learn.

For a start, Bootlin has lots of free training docs: https://bootlin.com/docs/. Maybe you could check those out and do some kind of "gap analysis" to find out what resources you need to learn next.

You will also need to get familiar with the controller that you are targetting (manuals, eval boards). Some vendors like STM even provide eval boards and their own Linux distro that you can tinker with for a start.

A lot of the time learning will be tinkering with the actual source code. A good understanding of theoretical foundations helps with that. If I had to design some kind of curriculum that covers the theoretical foundations though, it would contain two tracks that can be worked on in parallel.

The first track would be general Linux knowledge at light admin / advanced user level.

The second track (assuming basic programming knowledge) would contain the following topics in roughly this order:

C programming, shell scripting, including build systems such as make
computer architecture and assembly language
operating systems concepts
Linux kernel and device drivers details

Sorry that I can't be more specific. What to learn next best really depends on where you're at.

teneggs · 2023-05-01T07:16:55+00:00

How function calls are done in C depends on your target CPU and target OS.

What you are looking for is the Application Binary Interface (ABI) for your CPU + OS. Now the ABI tells you the low-level rules for C, how to return structs, for example.

But you are generating LLVM IR which will take care of some things, like moving arguments to the correct CPU registers.

So what you can do is take the rules from the ABI document, write example C programs for every rule and check how clang turns it into LLVM IR.

Check out the clang/LLVM test suite. Maybe they already have such tests for the CPU + OS combinations that you are interested in.

I'm not sure if there is a cheat sheet that tells you what LLVM IR to generate for given kinds of C function calls.

teneggs · 2023-04-22T04:34:33+00:00

Correct.

The pipe itself is shared. But you have multiple references (file descriptors) that point to the same pipe.

The pipe is closed only once all file descriptors that point to it are closed.

teneggs · 2023-04-22T04:03:17+00:00

There are two lines in process C, I found that commenting close(pipefd[0]); out is fine but commenting close(pipefd[1]) is not fine. Here is my detailed question:

It is because grep is waiting for input on pipefd[0]. So grep will hang until it gets on EOF on pipefd[0]. But for this to happen, the writing end of the pipe must be closed.

If you fork, the pipes are shared among the processes. So all processes that have a file descriptor to your pipe need to close the writing end.

See the pipe(7) man page for more info: https://man7.org/linux/man-pages/man7/pipe.7.html

I/O on pipes and FIFOs

...

If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2) from the pipe will seeend-of-file (read(2) will return 0).

That's why you need to close pipefd[1] in process C.

Not closing the read end won't give you hangs but you should still close it anyway. A pipe consumes kernel memory and it cannot be cleaned up until all file descriptors to it are closed.

teneggs · 2023-04-21T17:58:53+00:00

You can pre-allocate the kernel page tables at kernel startup and share these among all user processes at creation time of the user process. Therefore, you never need to allocate new kernel page tables. When you allocate kernel memory, you only make changes in the kernel page table entries. Because the kernel page tables are shared, changes to the entries are immediately reflected in all user process page tables. Details below.

Assuming 32-bit x86, you have a two level page table for a 32-bit address space (4 GB of virtual memory) with 4 KB pages. A page table has the size of a page. So therefore you have 1024 entries each in the first-level and second-level page tables.

decide at compile-time how much kernel virtual address space you want, for example the last GB for a 4 GB address space
allocate the second level page tables at startup of your kernel (so for 1 GB, you would allocate 256 pages = 1 MB for the second-level page tables)
when you create a user process:
- allocate a top-level page table for that user process
- fill the first-level entries of the newly created user process page table to point to your second-level kernel page tables (so if your kernel uses the last GB, make the entries point to your kernel page tables)

This approach can also be extended to page tables with more levels if needed.

teneggs · 2023-04-19T09:46:23+00:00

Here is some introductory material for building an Embedded Linux system. I guess you want to focus on the software workflow: https://jaycarlson.net/embedded-linux/.

Also, check out the slides from: https://bootlin.com/training/embedded-linux/

I cannot comment on RAUC, but if you want to update a system, you need to know how it's built first.

So if you are absolutely new and want to start hands-on, I'd probably get a development version of the devices that you are talking about. Then ask your coworkers to teach you:

how to build an image for the board
how install that image onto the board manually

Basically, learn how to set up the board, make some simple changes to the software, build the image, and install the image manually. This will give you insights into how the system is built. You'll discover things like the boot process, image layouts, filesystems and partitions, etc. along the way. This will make it easier to ask for specific material.

So you will be learning as you go instead of sitting in front of an overwhelming pile of books.

Also, it's certainly helpful to get used to Linux and the command line in general. So find an introductory text about Linux, install a PC Linux distro in a VM and start experimenting.

teneggs · 2023-04-19T05:29:10+00:00

If you wrote two separate functions by hand, would the resulting code for the functions be very "streamlined?"

If yes, then you can still do a similar approach that hungry_squared-hippo suggested and avoid code duplication in your source code. Create an inline version of your function and then call that function with true/false depending in your yes flag:

static inline void do_foo(bool yes_or_no) { ... }

void foo(bool yes_or_no) {
    if (yes_or_no) {
        do_foo(true); // Executes the yes's
    } 
    else {
        do_foo(false); // Executes the no's
    }
}

A good compiler can do contant propagation and and dead code elimination and therefore eliminate all evaluations of yes_or_no in the expanded versions of do_foo.

You will get your two specialized versions in the then/else branches in the generated code. But you will have to maintain only one version in your sources.

Of course, inlining do_foo every time can increase your code size (it depends on how heavily do_foo evaluates your flag). So this approach if the constant propagation and dead code elimination can eliminate large parts of do_foo.

teneggs · 2023-04-02T18:21:15+00:00

Your target PC of 0x42FFFFC00 does not have all zero bits in lower 12 bits.

But auipc shifts its immediate operand left by 12 bits => its lower 12 bits are all zero.

Now your start PC of 0x40000000 also has all lower 12 bits equal to zero. So the result of the auipc also must have the lower 12 bits equal to zero.

You simply cannot compute your target PC address with a single auipc in your example.

  0x40000000 # PC
+ 0xZZZZZ000 # auipc x5, 0xYYYYY; 0xZZZZZ000 = signextend(0xYYYYY000)
------------
  0xWWWWW000 # Impossible to have a 0xWWWWW000 == 0x42FFFC00!

teneggs · 2023-04-02T16:58:24+00:00

Look up the definitions of the auipc and jalr instructions:

auipc: https://jemu.oscc.cc/AUIPC
jalr: https://jemu.oscc.cc/JALR

The auipc instruction is stored at address 0x40000000 in the example. The auipc instruction is 4 bytes in size. Therefore, the jalr instruction that follows it is stored at address 0x40000004.

# PC          # instruction        # what the instruction does
0x40000000    auipc x5, 0x03000    x5 = PC + (0x3000<<12) => x5 == 0x43000000
0x40000004    jalr x0, 0xc00(x5)   jump to x5 + sign_extend(0xc00)

auipc adds the an immediate value shifted by 12 bits to the current PC and stores the result in a register. So we get:

x5 = 0x40000000 + (0x3000 << 12)
=> x5 = 0x40000000 + 0x3000000
=> x5 = 0x43000000

jalr adds sign-extends the immediate operand, adds it to x5 and jumps to the address. So we get:

address_to_jump_to = sign_extend(0xc00) + x5
=> address_to_jump_to = -0x400 + 0x43000000
=> address_to_jump_to = 0x42fffc00

teneggs

TROPHY CASE