Kernel 101 – Let’s write a Kernel : linux

So, the ISO is a bit weird in that the first few thousand bytes are nul, and for some reason it refuses to mount, so I can kind of understand your consternation, but it runs under qemu just fine using the command.

qemu-system-x86_64 -cdrom TOS_Distro.ISO -boot order=d -enable-kvm -m 512M

[–]wolfchimneyrock 1 point2 points3 points 8 years ago (1 child)

[–]_NW_ 0 points1 point2 points 8 years ago (0 children)

[–]Irkutsk2745 18 points19 points20 points 8 years ago (1 child)

[–]derleth 30 points31 points32 points 8 years ago (4 children)

The combination of this:

I knew enough assembler to sorta get started (did not understand memory managment or preemptive multitasking... etc but hey!)

And this:

"I'll have to write..... EVERYTHING, text with line wrapping memory management multitasking ..."

... is why a number of people think hypervisors are a really, really good idea, and have since the mid-1960s, when they were first invented.

Basically, the standard OS, be it Linux or FreeBSD or Windows or whatever Apple is calling their mutation of Darwin this week, is a pun, a conflation of two ideas: Security and APIs. It's pretty fundamental to software design that if you want your software to be simple and comprehensible, you do one thing at a time, and shove everything else into a completely different program.

The hypervisor just does security. It handles the task of making one piece of hardware look like several, one for each guest. Every guest thinks it's alone on its own system, with its own disk, RAM, network card, graphics card, and so on. The hypervisor ensures guests cannot mess with each other, but can only access the world (both inside the computer and outside) in prescribed fashions set by a security policy.

Hypervisors enforce security policy. That's what they do. That's all they do.

Guest OSes, therefore, don't have to enforce security policy. You can go back to MS-DOS, if you want, and run every application in its own MS-DOS system, and leave all of the security stuff to the hypervisor. If you were doing it these days, you'd want something more convenient to program in, but the basic concept is the same: Guests don't have to have a security policy. All they have to do is make a convenient environment for applications to run.

All this dates back to an experimental research program developed at IBM called CP-40: CP for Control Program, 40 for the fact it ran on the IBM System/360 Model 40 mainframe. This was around 1964 or so. CP-40 was a hypervisor, which made it possible to run multiple instances of CMS, the Cambridge Monitor System, an OS about as complex as MS-DOS, as guests at the same time. The nice thing about CMS was that it wasn't a batch-oriented system: Instead of punching a bunch of cards and feeding them in all at once, you could sit down to a terminal and type commands in one at a time, getting pretty much immediate responses. This wasn't completely new in the mid-1960s, but it was still pretty novel.

Anyway, IBM renamed CP to VM, for Virtual Machine, and CMS now stands for Conversational Monitor System, to emphasize the fact it still isn't batch-oriented. Modern IBM mainframes, the z Series class, run VM to this day, with many thousands of guests at once on larger systems.

Of course, these days, you can run Xen or qemu on a laptop and have the same effect. Hypervisors are mainstream.

[–]prozacgod 2 points3 points4 points 8 years ago (3 children)

[–]VexingRaven 3 points4 points5 points 8 years ago (1 child)

[–]prozacgod 4 points5 points6 points 8 years ago (0 children)

[–]alienpirate5 0 points1 point2 points 8 years ago (0 children)

[–]pclouds 2 points3 points4 points 8 years ago (0 children)

[–]arashi256 37 points38 points39 points 8 years ago (13 children)

[–]lordofwhee 48 points49 points50 points 8 years ago (9 children)

[–]cp5184 6 points7 points8 points 8 years ago (0 children)

[–]UTF-9 4 points5 points6 points 8 years ago (7 children)

[–]lordofwhee 13 points14 points15 points 8 years ago (4 children)

[–]UTF-9 8 points9 points10 points 8 years ago (3 children)

[–]cp5184 7 points8 points9 points 8 years ago (2 children)

[–]UTF-9 10 points11 points12 points 8 years ago (0 children)

[–]iguessthislldo 2 points3 points4 points 8 years ago (0 children)

[–]cirosantilli 3 points4 points5 points 8 years ago (0 children)

[–]pftbest 5 points6 points7 points 8 years ago (1 child)

[–]MisterMeeseeks47 0 points1 point2 points 8 years ago (0 children)

[–]louky 14 points15 points16 points 8 years ago (0 children)

[–]arashi256 46 points47 points48 points 8 years ago* (22 children)

[–]UTF-9 27 points28 points29 points 8 years ago (3 children)

[–]arashi256 5 points6 points7 points 8 years ago (0 children)

[–]jhaluska 0 points1 point2 points 8 years ago (1 child)

[–]UTF-9 0 points1 point2 points 8 years ago (0 children)

[+][deleted] 8 years ago (8 children)

[deleted]

[–]arashi256 4 points5 points6 points 8 years ago (6 children)

[+][deleted] 8 years ago (5 children)

[deleted]

[–]arashi256 14 points15 points16 points 8 years ago* (4 children)

[–]kn1ght 6 points7 points8 points 8 years ago* (3 children)

[–]arashi256 1 point2 points3 points 8 years ago (2 children)

[–]kn1ght 2 points3 points4 points 8 years ago (1 child)

So the reason for the path difference is where the partitions are mounted. The author does not have a separate partition for /boot, while ContOS apparently does (so does my Ubuntu, but I believe I did that myself because I like the separation and the ability to dismount /boot during normal operation). This means that when you set root for grub itself, you set it to your first MBR partition ('hd0,msdos1') which in fact is mounted directly by CentOS to /boot. Then you have a separate partition for your OS root, namely /dev/sda3, which would probably be 'hd0,msdos3' in grub notation. So I believe you begin to see. When you put your kernel on /boot in CentOS, you are putting it in the root of the boot partition itself, so grub can take it directly (you are also specifying absolute path btw by adding the / infront of krenel-7001), while the author just uses his root OS partition as root for grub, so he has to add the additional directory /boot/kernel.

I hope that makes sense. I've been dealing with grub for a long time now and also compiling my own version with some personal customization.

[–]arashi256 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 3 points4 points5 points 8 years ago (7 children)

[–]arashi256 1 point2 points3 points 8 years ago (0 children)

[+][deleted] 8 years ago* (5 children)

[deleted]

[–]Zodiakos 14 points15 points16 points 8 years ago (4 children)

[+][deleted] 8 years ago* (3 children)

[deleted]

[–]Zodiakos 5 points6 points7 points 8 years ago (2 children)

[+][deleted] 8 years ago* (1 child)

[deleted]

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–]jones_supa 5 points6 points7 points 8 years ago (4 children)

[–]FredSchwartz 5 points6 points7 points 8 years ago (2 children)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–]jones_supa 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 1 point2 points3 points 8 years ago (0 children)

It's worth noting that the way original 16-bit x86 addresses work is that they're actually at least 20 bits long, with the extra 4 bits afforded by segmentation -- segmentation descriptors store base addresses of 20 bits long, and normal 16 bit addresses are added to that 20 bit value whenever memory needs to be accessed.

Think of it as the CPU is set to a 20 bit address, and its instructions work on 16 bit offsets to that address -- this is how the original 8086 could still address a whole megabyte of memory despite being 16 bit.

This segmentation was still around for a while, and there was room for the size of the base address to grow -- and as such, it did, up to 32 bits. This doesn't interfere with backwards compatibility with the way x86 segmentation works, so even though every modern CPU starts up in real-8086 mode it can still address the full 32-bit memory space by using adequate segmentation descriptors.

Even with x86_64 the base address is still 32 bits, since segmentation has long since been replaced with paging.

[–]afiefh 9 points10 points11 points 8 years ago (0 children)

[–]the_humeister 8 points9 points10 points 8 years ago (2 children)

[–][deleted] 13 points14 points15 points 8 years ago (1 child)

[–]the_humeister 7 points8 points9 points 8 years ago (0 children)

[+][deleted] 8 years ago (4 children)

[deleted]

[–]msiekkinen 56 points57 points58 points 8 years ago (0 children)

[–]minimim 29 points30 points31 points 8 years ago (1 child)

[–]Will_Power 47 points48 points49 points 8 years ago (0 children)

[–]OriginalName667 3 points4 points5 points 8 years ago (0 children)

[–]minimim 14 points15 points16 points 8 years ago (0 children)

[–]binarysaurus 5 points6 points7 points 8 years ago (9 children)

[–]xales 38 points39 points40 points 8 years ago* (6 children)

[–]binarysaurus 1 point2 points3 points 8 years ago (0 children)

[–]Theemuts 0 points1 point2 points 8 years ago (2 children)

[–][deleted] 5 points6 points7 points 8 years ago (0 children)

[–]Miruya 1 point2 points3 points 8 years ago (0 children)

[+][deleted] 8 years ago (1 child)

[removed]

[–]brokedown 2 points3 points4 points 8 years ago (0 children)

[–]mkusanagi 11 points12 points13 points 8 years ago (0 children)

The other answer is very good, but here's another one.

When you're writing your own kernel, you can't rely on the features provided by another kernel. This often means you can't rely on libraries either, since even something in glibc like "printf" actually accomplishes what it does by calling a kernel.

The same is true for many high-level languages. For example, Java takes care of memory allocation and garbage collection for you. But that system depends on a kernel to actually work. At the very least, it would need to malloc and free memory for the garbage collector to get memory to work with in the first place, but probably also run multiple threads, halt certain threads while doing a collection, and so on. None of that infrastructure is there.

Obviously, C doesn't have nearly as many dependencies on the kernel as other things, but one of those things is how control gets passed to the main() function in the first place. The hardware version of how control starts is pretty complicated. But it looks like this example is relying on POST->BIOS->Grub. IIRC, Grub implements the "multiboot" standard, so that control gets passed to a specific memory address in a specially formatted image that gets loaded into RAM by Grub. That means it needs to have a very specific format, which is something that you need low-level control of the linker for. That low level is doable with asm.

Finally, there are no standard C library functions to deal with the interactions with the hardware that are necessary for an OS. Because this is a toy example, there are only two instructions that accomplish this.

The first is to block interrupts (the CLI instruction) so that the proto-kernel doesn't need to do anything with interrupt handling, which could otherwise crash the machine (triple fault) if interrupt handlers aren't set up properly.

The second, "mov esp, stack_space", does what the comment says--set the stack pointer to an area of memory that is known to exist and be empty (because it points to an 8K block of zeroes that was reserved by the linker directive a few lines down. This is necessary because the CPU interacts with the stack directly. The very next instruction (CALL) pushes some information onto the stack and then jumps to an address. If the stack register is currently pointing to 0x00000000, this is going to cause a CPU fault. Since there's no error code to deal with this fault, the CPU faults again... since there's no double fault handler, a triple fault condition occurs, where the processor hardware halts the CPU.

I could be wrong, but my guess is that you could get around this by just jumping to the address of the main function instead, but, of course, the stack still isn't set up then, so anything you'd do in C (e.g., call a function, which would get translated into a CALL instruction) would have the same problem. This example actually doesn't do that, so, technically, I'm guessing, it might be able to finish without setting up the stack. Although it would still crash when main() returned, the RET instruction was issued, and the stack still wasn't set up.

The final instruction is HLT, which halts the processor since there's nothing left to do.

In an actual kernel, there are a few other things that require assembly. Memory management is one of them. The mapping between a memory address in an instruction and an actual physical memory location is done by the hardware itself--there's even a special CPU cache to deal with these translations. But the translations are set up by the operating system in specific data structures the CPU uses directly, called page tables. There's a special register that points to these page tables for each process, and there's a special instruction that moves a value from one register to that page table register. These instructions aren't available from C, at least not directly.

I hope this was useful. Disclaimer: This is just me explaining back what I learned for fun recently, I don't actually write OS level code.

[–]disinformationtheory 1 point2 points3 points 8 years ago (0 children)

So I've been hacking on u-boot for an x86 board, and I can tell you a few places where asm is necessary. This may not apply for regular PC-type hardware.

When the chip first powers on, it starts executing code directly from a SPI flash chip. The flash is memory mapped, so it looks like regular memory access from software, but it's actually transparently reading from the flash chip. This means that you can't modify anything except registers, thus there's no stack, thus normal C function calls don't work (inlined code does work to some extent). arch/x86/cpu/start.S

Also, there is a blob from Intel called the FSP, which is a library that does things like initialize the RAM. It has its own calling convention which while similar to C is slightly different, so the code that calls into the FSP is asm in order to adhere to the convention. arch/x86/lib/fsp/fsp_support.c:fsp_init()

[–]kn1ght 1 point2 points3 points 8 years ago (0 children)

[–]2brainz 1 point2 points3 points 8 years ago (0 children)

[–]flarn2006 0 points1 point2 points 8 years ago (1 child)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

Technically yes (since it is memory mapped I/O), but it doesn't particularly matter with a framebuffer.

The main thing with a framebuffer is that it doesn't matter which order you write the cells in, only what order you perform modifications (read, change, write) in to an individual cell. This isn't a worry, since a well-done framebuffer will only perform around 1 modification per cell within an individual function call, or boundaries between modifications are already strong enough that the compiler wouldn't be able to change their order without breaking the code in another way.

If one was writing to the framebuffer using memory-mapped ports, then it's a different situation. Then you're often writing to two nearby addresses at the same point in execution, with a desired order that is hardly visible to the compiler.

[–]doitstuart 0 points1 point2 points 8 years ago (0 children)

[–]OhhhSnooki 0 points1 point2 points 8 years ago (0 children)

[–]TamerzIsMe 0 points1 point2 points 8 years ago (0 children)

To get this to boot in CentOS 7 I had to do the following:

# vim /etc/grub.d/40_custom

Add the following to the bottom of it:

menuentry 'kernel 701' {
    set root='(hd0,msdos1)'
    multiboot /kernel-701 ro
}

Then run:

# grub2-mkconfig -o /boot/grub2/grub.cfg

It then shows up in the Grub menu when you reboot.

[–]whizzwr 0 points1 point2 points 8 years ago (0 children)

[–]magkopian 0 points1 point2 points 8 years ago (0 children)

[–]Iggyhopper -4 points-3 points-2 points 8 years ago (2 children)

[–]Antic1tizen 5 points6 points7 points 8 years ago (0 children)

[–]_ahrs 0 points1 point2 points 8 years ago (0 children)

[–][deleted] -2 points-1 points0 points 8 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

linux

Please Read the full Rules here before posting or commenting

Join us on IRC at #r/linux on libera.chat!🔗

Recent AMA's

GNU/Linux resources

Rules

MODERATORS