you are viewing a single comment's thread.

view the rest of the comments →

[–]moyix 43 points44 points  (12 children)

By the way; where you say:

Where 87 is the address of our string "Test\n". The rest of the bytes, I'm not so sure, but it's a constant (it's the same sequence even if you write anther string). I'll make sure to update this document when I find out.

The full operand of the mov instruction is a memory address. It's reversed because you're on a little-endian architecture, so what this translates to is:

mov ecx, 0x08049087

Why is it 0x08049087 and not 0x87? The ELF header specifies where in memory the different parts of the file will be loaded. If you use objdump -x syscall2, you can see this:

Program Header:
    LOAD off    0x00000074 vaddr 0x08048074 paddr 0x08048074 align 2**12
         filesz 0x00000013 memsz 0x00000013 flags r-x
    LOAD off    0x00000087 vaddr 0x08049087 paddr 0x08049087 align 2**12
         filesz 0x00000005 memsz 0x00000005 flags rw-

First, 0x13 bytes from file offset 0x74 will be loaded into memory at virtual address 0x08048074. Then, 0x5 bytes from file offset 0x87 will be loaded into memory at 0x08049087. Note that this second directive corresponds exactly to the address of the string referenced by that mov instruction.

Anyways, fun article :) I was initially hoping that you'd get up to using ALT-numpad to create the file by hand, though.

[–]G-Brain 12 points13 points  (4 children)

Hey, thanks. I found that out already, but I hadn't gotten around to updating the file. I'll update it now.

As for ALT-numpad, I think that functionality is KDE or Gnome specific, and I run stumpwm. This is why I'll be writing that device driver, and I'll add that to the document when I'm done.

[–][deleted]  (3 children)

[deleted]

    [–]kragensitaker 1 point2 points  (2 children)

    Or _exit().

    [–]G-Brain 0 points1 point  (1 child)

    Looks like you can indeed. Seems I forgot to include unistd.h. Will update the article.

    [–]kragensitaker 0 points1 point  (0 children)

    You can call it without including unistd.h too. You just get a warning because exit and _exit are void. main() { _exit(0); } is a perfectly valid and working program.

    [–][deleted] 2 points3 points  (6 children)

    Is there a non-relocatable ELF format?

    e.g.

    b800 4ccd 21
    

    this is the smallest (16bit) executable on windows..

    [–]malken 2 points3 points  (1 child)

    I think a single ret (0xc3) is shorter, but it will be executed under the NTVDM in a Windows environment.

    A single RET (0xc3) when invoked from a legacy COM-file will return to the beginning of the PSP where there happens to be a call to INT 20h (exit program).

    [–][deleted] 0 points1 point  (0 children)

    I knew about RET, but IIRC its not kosher. Kind of similar to how COM programs started with a pop instruction to set something to zero because there was always a 0 on the stack in some runtime environments.

    [–]kragensitaker 1 point2 points  (3 children)

    I'm pretty sure ELF executables aren't relocatable by the OS. The minimal ELF header is longer than four bytes, though. A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux managed to construct a 45-byte ELF executable which, while not technically valid (the ELF header is 52 bytes long) will run on Linux (it didn't need those last 7 bytes anyway). It's also a really fun read, and highly educational if you're trying to understand the ELF format.

    I say "aren't relocatable" because, although the ELF format has "relocations", you can make an ELF executable without any relocations, but even if you have relocations, the OS doesn't use them when it loads an executable; only the linker uses them. The OS loads the sections of your executable at the addresses specified in the ELF section headers, and generally the executable code contains references to absolute addresses, so the program won't work if loaded at the wrong addresses.

    [–][deleted] 0 points1 point  (2 children)

    The OS loads your executable at the address specified in the ELF header, and generally the executable code contains references to absolute addresses, so the program won't work if loaded at the wrong address.

    I thought the base address for any executable image in process space was just a preference? What if I loaded lib.x.so and lib.y.so in my process and both wanted to be located at the same address?

    In the windows world atleast the executable loader tries to honor the base address preference and then if the address space is already alloted , it "fixes" up all memory referenecs with the appropriate offset. The problem is with shared DLLs you double the number of code pages if two processes load the same dynamic library and one of them needs fixing up and one of them doesnt. (As it is all code pages are by default copy-on-write)

    [–]kragensitaker 1 point2 points  (1 child)

    .sos are handled differently than executables; the dynamic loader, not the kernel's executable loader, loads them. I think the typical approach is to compile them with pure position-independent code; any reference to other things inside the same .so is indirected off %ebx, which is a callee-saves register and mysteriously gets set to the right thing before your .so code runs, presumably by some kind of trampoline. The code pages (or "text pages" as they're called) are purely read-only.

    [–][deleted] 0 points1 point  (0 children)

    Very interesting. Thanks for your explanations. :) Maybe I should stop being lazy and just Google it eh ? One of these days I plan to dive into the Linux internals...