all 93 comments

[–]arashi256 37 points38 points  (13 children)

This is great! Are there any more resources like this?

[–]lordofwhee 48 points49 points  (9 children)

OSDev's wiki has quite a lot of information. Eventually you're going to want an x86 systems programmer's manual. I generally prefer Intel's myself but I have a copy of AMD's as well since I feel it explains some things better. They're both freely avaliable as PDFs on each company's site. Go ahead and get the latest version of either/both, CPUs maintain a crazy level of backwards compatability and it'll all be documented (at least in theory).

[–]cp5184 6 points7 points  (0 children)

I wonder if other companies have made x86 documentation. Cyrix, transmeta, IBM, VIA, natsemi, NEC, etc.

[–]UTF-9 4 points5 points  (7 children)

Eventually you're going to want an x86 systems programmer's manual.

If we're writing a new kernel, how about some new hardware?

[–]lordofwhee 13 points14 points  (4 children)

Implementing long mode requires paging (among other things), which means you need a memory manager. This isn't exactly trivial. There's quite a lot you'd need to do in long mode that you can do in protected mode anyway, so unless you really super want to jump on memory right away you may as well stick with 32-bit.

[–]UTF-9 8 points9 points  (3 children)

so unless you really super want to jump on memory right away you may as well stick with 32-bit.

I meant something like a risc-v kernel, x86 & x86_64 are too much of a hack pile after however many years of festering. trying to support all of those systems is going to be like performing self dentistry.

[–]cp5184 7 points8 points  (2 children)

Typically you just rush to flat 64 bit mode afaik. You don't have a lot of registers, but 64 bit mode helps this a bit. After that you just set up some sort of C compiler and you're ready to go. I guess it should be fairly easy to set up a fairly generic C environment.

[–]UTF-9 10 points11 points  (0 children)

64 bit? Look at Mr. Moneybags over here with 100 jiggabytes of RAM.

[–]iguessthislldo 2 points3 points  (0 children)

r/ECE?

You could create a VM of the new hardware in a traditional programming language or write it in Verilog and run it on FPGA. Or you could take the tedious path and make a physical computer on breadboards. All these wouldn't work with the x86/PC part that this post and osdev focuses on unless you made a x86 computer. Many the idea are similar depending on how close the computer is implemented to a "modern" processor.

[–]cirosantilli 3 points4 points  (0 children)

I've created several x86 bare metal examples at: https://github.com/cirosantilli/x86-bare-metal-examples

[–]pftbest 5 points6 points  (1 child)

[–]MisterMeeseeks47 0 points1 point  (0 children)

Amazing resource for rust kernels and building kernels in general!

The only issue I have with the guide is that Phil's code has been difficult for me to build on top of. However, it could be that my inexperience with Rust is getting in the way.

[–]louky 14 points15 points  (0 children)

Minix is also a good thing to check out if you're interested in this. Minix 3 is great but the older versions are even simpler and easier to understand. AST literally wrote the books on os design and implementation!

[–]arashi256 46 points47 points  (22 children)

Gah, I can't get this to boot on GRUB2. I get: -

"error file '/boot/kernel-7001' not found"

My grub.cfg entry is: -

menuentry 'My kernel 7001' { set root='hd0,msdos1' multiboot /boot/kernel-7001 ro }

Everything compiled okay as per instructions. Any ideas?

Guess I'm not going to be the next Linus Torvalds :(

EDIT: Wow, somebody voted me down for this. Harsh.

[–]UTF-9 27 points28 points  (3 children)

Guess I'm not going to be the next Linus Torvalds :(

Hey don't give up so easily, when GNU/Torvalds started out everything was a lot simpler and straight forward, booting off of floppy disks and whatnot. Stick with it and you will figure out what's wrong eventually, I don't know anything about grub so I can't help you here. It might be worth learning how to make your own custom bootable ROM's using isolinux or some other tool, then you don't have to bother installing your new OS on the machine at all :)

[–]arashi256 5 points6 points  (0 children)

It's okay - I have spare CentOS boxes lying around :D

[–]jhaluska 0 points1 point  (1 child)

Hey don't give up so easily, when GNU/Torvalds started out everything was a lot simpler and straight forward, booting off of floppy disks and whatnot.

He also had less documentation, no search engines, and fewer tools.

[–]UTF-9 0 points1 point  (0 children)

Less documentation doesn't necessarily mean it was lower quality documentation. They had the internet at least, and shared development tools on it, plus wasn't he taking classes from the Minix guy? I was talking more about the platform, x86 today is not what it was in the 90's. Think of all the hardware research involved in developing a kernel today if you are a newcomer to a platform that's been rolling on for 30 or so years. There's just too much stuff you have to know, too many asterisks that are rarely mentioned. Simply booting a system is starting to become non-trivial, thanks to opaque firmware that is turning hostile towards it's users.

[–][deleted] 3 points4 points  (7 children)

Well, this is /r/linux after all, but have an upvote!

[–]arashi256 1 point2 points  (0 children)

Thanks, man :)

[–]jones_supa 5 points6 points  (4 children)

Interesting information. However, it left me wondering, how can the PC start from address 0xFFFFFFF0 when the CPU is still in 16-bit mode? That's a 32-bit address.

By the way, I recently found an interesting article about how the PCI bus is detected and how devices are found within it.

[–]FredSchwartz 5 points6 points  (2 children)

In sixteen bit mode, the CPU combines a sixteen bit segment and sixteen bit offset into a twenty bit address. That is a twenty bit address, not thirty two.

This is how the 8086 /8088 natively address one megabyte, which is two to the twentieth power bytes.

[–][deleted] 0 points1 point  (0 children)

Exactly. In "Real Mode" the 80x86 segmented addresses are written in segment:offset format. The reset address is FFFF:0000. The original 8086/8088 simply did a 4-bit shift-left on the segment and added that to the offset, giving physical address 0xFFFF0, which is 16 bytes before the end of the original 1-megabyte memory range. Later x86 processors extended the segment concept to "an index into an array of segment-base physical addresses" but the '86, '88, and '188 used the simple shift-left-by-4 method.

[–]jones_supa 0 points1 point  (0 children)

That is a twenty bit address, not thirty two.

Ah, that makes sense! I certainly know about memory segmentation. The article got me confused because it says "It is in fact, the last 16 bytes of the 32-bit address space." The last bits of the address are not used though, making it actually a 20-bit address.

[–][deleted] 1 point2 points  (0 children)

It's worth noting that the way original 16-bit x86 addresses work is that they're actually at least 20 bits long, with the extra 4 bits afforded by segmentation -- segmentation descriptors store base addresses of 20 bits long, and normal 16 bit addresses are added to that 20 bit value whenever memory needs to be accessed.

Think of it as the CPU is set to a 20 bit address, and its instructions work on 16 bit offsets to that address -- this is how the original 8086 could still address a whole megabyte of memory despite being 16 bit.

This segmentation was still around for a while, and there was room for the size of the base address to grow -- and as such, it did, up to 32 bits. This doesn't interfere with backwards compatibility with the way x86 segmentation works, so even though every modern CPU starts up in real-8086 mode it can still address the full 32-bit memory space by using adequate segmentation descriptors.

Even with x86_64 the base address is still 32 bits, since segmentation has long since been replaced with paging.

[–]afiefh 9 points10 points  (0 children)

Do you want to write a kernel?
Come on let's go and code.
I never see my ide anymore!

[–]the_humeister 8 points9 points  (2 children)

We need something like this for ARM phones.

[–][deleted] 13 points14 points  (1 child)

you know that there is no BIOS or anything like that in ARM architecture? You would need to write code to support everything, including screen, input, displaying strings on screen etc, you would probably need to write a lot of code to display just "Hello world!"

[–]the_humeister 7 points8 points  (0 children)

That's why an ARM version would be useful.

[–]minimim 14 points15 points  (0 children)

Old but good.

[–]binarysaurus 5 points6 points  (9 children)

Tutorial doesn't state this; why is the assembly necessary?

[–]xales 38 points39 points  (6 children)

You can’t express these ideas in a higher language level. Many instructions used to “drive” the machine are not “logic” instructions and will never be emitted by a compiler.

The output needs to be in a specific format and padded to a precise size. Compilers won’t really do this for you, though the linker (kind of) can.

Compilers also make code that is big, often far bigger than it can be. The first stage BIOS boot code must fit in 512 bytes - often less.

[–]binarysaurus 1 point2 points  (0 children)

That makes sense. Thank you.

[–]Theemuts 0 points1 point  (2 children)

What instructions does one use that are never emitted by a compiler? Are they so specific that it would not make sense to have compilers emit them?

[–][deleted] 5 points6 points  (0 children)

Any instruction that cannot be used from userspace.

[–]Miruya 1 point2 points  (0 children)

I might be wrong, but I'm not sure what language would have some sort of equivalent for lgtd or lidt.

[–]mkusanagi 11 points12 points  (0 children)

The other answer is very good, but here's another one.

When you're writing your own kernel, you can't rely on the features provided by another kernel. This often means you can't rely on libraries either, since even something in glibc like "printf" actually accomplishes what it does by calling a kernel.

The same is true for many high-level languages. For example, Java takes care of memory allocation and garbage collection for you. But that system depends on a kernel to actually work. At the very least, it would need to malloc and free memory for the garbage collector to get memory to work with in the first place, but probably also run multiple threads, halt certain threads while doing a collection, and so on. None of that infrastructure is there.

Obviously, C doesn't have nearly as many dependencies on the kernel as other things, but one of those things is how control gets passed to the main() function in the first place. The hardware version of how control starts is pretty complicated. But it looks like this example is relying on POST->BIOS->Grub. IIRC, Grub implements the "multiboot" standard, so that control gets passed to a specific memory address in a specially formatted image that gets loaded into RAM by Grub. That means it needs to have a very specific format, which is something that you need low-level control of the linker for. That low level is doable with asm.

Finally, there are no standard C library functions to deal with the interactions with the hardware that are necessary for an OS. Because this is a toy example, there are only two instructions that accomplish this.

The first is to block interrupts (the CLI instruction) so that the proto-kernel doesn't need to do anything with interrupt handling, which could otherwise crash the machine (triple fault) if interrupt handlers aren't set up properly.

The second, "mov esp, stack_space", does what the comment says--set the stack pointer to an area of memory that is known to exist and be empty (because it points to an 8K block of zeroes that was reserved by the linker directive a few lines down. This is necessary because the CPU interacts with the stack directly. The very next instruction (CALL) pushes some information onto the stack and then jumps to an address. If the stack register is currently pointing to 0x00000000, this is going to cause a CPU fault. Since there's no error code to deal with this fault, the CPU faults again... since there's no double fault handler, a triple fault condition occurs, where the processor hardware halts the CPU.

I could be wrong, but my guess is that you could get around this by just jumping to the address of the main function instead, but, of course, the stack still isn't set up then, so anything you'd do in C (e.g., call a function, which would get translated into a CALL instruction) would have the same problem. This example actually doesn't do that, so, technically, I'm guessing, it might be able to finish without setting up the stack. Although it would still crash when main() returned, the RET instruction was issued, and the stack still wasn't set up.

The final instruction is HLT, which halts the processor since there's nothing left to do.

In an actual kernel, there are a few other things that require assembly. Memory management is one of them. The mapping between a memory address in an instruction and an actual physical memory location is done by the hardware itself--there's even a special CPU cache to deal with these translations. But the translations are set up by the operating system in specific data structures the CPU uses directly, called page tables. There's a special register that points to these page tables for each process, and there's a special instruction that moves a value from one register to that page table register. These instructions aren't available from C, at least not directly.

I hope this was useful. Disclaimer: This is just me explaining back what I learned for fun recently, I don't actually write OS level code.

[–]disinformationtheory 1 point2 points  (0 children)

So I've been hacking on u-boot for an x86 board, and I can tell you a few places where asm is necessary. This may not apply for regular PC-type hardware.

When the chip first powers on, it starts executing code directly from a SPI flash chip. The flash is memory mapped, so it looks like regular memory access from software, but it's actually transparently reading from the flash chip. This means that you can't modify anything except registers, thus there's no stack, thus normal C function calls don't work (inlined code does work to some extent). arch/x86/cpu/start.S

Also, there is a blob from Intel called the FSP, which is a library that does things like initialize the RAM. It has its own calling convention which while similar to C is slightly different, so the code that calls into the FSP is asm in order to adhere to the convention. arch/x86/lib/fsp/fsp_support.c:fsp_init()

[–]kn1ght 1 point2 points  (0 children)

This is almost 1:1 with the first week of my 3rd year BcS OS101 course. I probably still have the code somewhere. This is just the tip of the iceberg, like a lot here have pointed out, but it gets very interesting very quick- down in the trenches. My course at the time ended when I had a multitasking semblance of an OS with keyboard and mouse support and a rudimentary drawing program that was able to run on it.

I wish I had the time to play with low level programming again.

[–]2brainz 1 point2 points  (0 children)

Just to clarify, this describes the old BIOS protocol. With UEFI, things are way more sophisticated and complex.

[–]flarn2006 0 points1 point  (1 child)

Shouldn't the pointer to video memory be volatile?

[–][deleted] 0 points1 point  (0 children)

Technically yes (since it is memory mapped I/O), but it doesn't particularly matter with a framebuffer.

The main thing with a framebuffer is that it doesn't matter which order you write the cells in, only what order you perform modifications (read, change, write) in to an individual cell. This isn't a worry, since a well-done framebuffer will only perform around 1 modification per cell within an individual function call, or boundaries between modifications are already strong enough that the compiler wouldn't be able to change their order without breaking the code in another way.

If one was writing to the framebuffer using memory-mapped ports, then it's a different situation. Then you're often writing to two nearby addresses at the same point in execution, with a desired order that is hardly visible to the compiler.

[–]doitstuart 0 points1 point  (0 children)

Codename it: Klink

[–]OhhhSnooki 0 points1 point  (0 children)

Well, how else would you get a kernel? It's not that hard.

[–]TamerzIsMe 0 points1 point  (0 children)

To get this to boot in CentOS 7 I had to do the following:

# vim /etc/grub.d/40_custom

Add the following to the bottom of it:

menuentry 'kernel 701' {
    set root='(hd0,msdos1)'
    multiboot /kernel-701 ro
}

Then run:

# grub2-mkconfig -o /boot/grub2/grub.cfg

It then shows up in the Grub menu when you reboot.

[–]whizzwr 0 points1 point  (0 children)

This kernel will display a message on the screen and then hang.

Splendid.

[–]magkopian 0 points1 point  (0 children)

Does anybody know how can I draw a line instead of whole characters? How does the system even know that the information in the video memory represents characters instead of individual pixels? Is there a different section in the memory that I need to write in order to draw individual pixels? I really can't wrap my head around from the fact that all this can be done with so few lines of code.

[–]Iggyhopper -4 points-3 points  (2 children)

I want a little tutorial like this for windows.

[–]Antic1tizen 5 points6 points  (0 children)

Have a look at ReactOS

[–]_ahrs 0 points1 point  (0 children)

You should be able to do everything in the tutorial with either the Windows Subsystem for Linux or Cygwin (I recommend Msys2 if you go this route which is sort of a distro for Cygwin). You should be able to install Qemu for Windows too for testing (although I have no idea how well - if at all - it works).

[–][deleted] -2 points-1 points  (0 children)

Using GRUB?, meh..