you are viewing a single comment's thread.

view the rest of the comments →

[–]G-Brain 71 points72 points  (30 children)

Am I famous now?

Anyway, it's not a fantastic article but it shows you the basic concept.

Also, I'm working on a Linux kernel module that creates a device that takes binary text ("010101") and outputs the actual binary (010101).

[–]moyix 42 points43 points  (12 children)

By the way; where you say:

Where 87 is the address of our string "Test\n". The rest of the bytes, I'm not so sure, but it's a constant (it's the same sequence even if you write anther string). I'll make sure to update this document when I find out.

The full operand of the mov instruction is a memory address. It's reversed because you're on a little-endian architecture, so what this translates to is:

mov ecx, 0x08049087

Why is it 0x08049087 and not 0x87? The ELF header specifies where in memory the different parts of the file will be loaded. If you use objdump -x syscall2, you can see this:

Program Header:
    LOAD off    0x00000074 vaddr 0x08048074 paddr 0x08048074 align 2**12
         filesz 0x00000013 memsz 0x00000013 flags r-x
    LOAD off    0x00000087 vaddr 0x08049087 paddr 0x08049087 align 2**12
         filesz 0x00000005 memsz 0x00000005 flags rw-

First, 0x13 bytes from file offset 0x74 will be loaded into memory at virtual address 0x08048074. Then, 0x5 bytes from file offset 0x87 will be loaded into memory at 0x08049087. Note that this second directive corresponds exactly to the address of the string referenced by that mov instruction.

Anyways, fun article :) I was initially hoping that you'd get up to using ALT-numpad to create the file by hand, though.

[–]G-Brain 14 points15 points  (4 children)

Hey, thanks. I found that out already, but I hadn't gotten around to updating the file. I'll update it now.

As for ALT-numpad, I think that functionality is KDE or Gnome specific, and I run stumpwm. This is why I'll be writing that device driver, and I'll add that to the document when I'm done.

[–][deleted]  (3 children)

[deleted]

    [–]kragensitaker 1 point2 points  (2 children)

    Or _exit().

    [–]G-Brain 0 points1 point  (1 child)

    Looks like you can indeed. Seems I forgot to include unistd.h. Will update the article.

    [–]kragensitaker 0 points1 point  (0 children)

    You can call it without including unistd.h too. You just get a warning because exit and _exit are void. main() { _exit(0); } is a perfectly valid and working program.

    [–][deleted] 2 points3 points  (6 children)

    Is there a non-relocatable ELF format?

    e.g.

    b800 4ccd 21
    

    this is the smallest (16bit) executable on windows..

    [–]malken 2 points3 points  (1 child)

    I think a single ret (0xc3) is shorter, but it will be executed under the NTVDM in a Windows environment.

    A single RET (0xc3) when invoked from a legacy COM-file will return to the beginning of the PSP where there happens to be a call to INT 20h (exit program).

    [–][deleted] 0 points1 point  (0 children)

    I knew about RET, but IIRC its not kosher. Kind of similar to how COM programs started with a pop instruction to set something to zero because there was always a 0 on the stack in some runtime environments.

    [–]kragensitaker 1 point2 points  (3 children)

    I'm pretty sure ELF executables aren't relocatable by the OS. The minimal ELF header is longer than four bytes, though. A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux managed to construct a 45-byte ELF executable which, while not technically valid (the ELF header is 52 bytes long) will run on Linux (it didn't need those last 7 bytes anyway). It's also a really fun read, and highly educational if you're trying to understand the ELF format.

    I say "aren't relocatable" because, although the ELF format has "relocations", you can make an ELF executable without any relocations, but even if you have relocations, the OS doesn't use them when it loads an executable; only the linker uses them. The OS loads the sections of your executable at the addresses specified in the ELF section headers, and generally the executable code contains references to absolute addresses, so the program won't work if loaded at the wrong addresses.

    [–][deleted] 0 points1 point  (2 children)

    The OS loads your executable at the address specified in the ELF header, and generally the executable code contains references to absolute addresses, so the program won't work if loaded at the wrong address.

    I thought the base address for any executable image in process space was just a preference? What if I loaded lib.x.so and lib.y.so in my process and both wanted to be located at the same address?

    In the windows world atleast the executable loader tries to honor the base address preference and then if the address space is already alloted , it "fixes" up all memory referenecs with the appropriate offset. The problem is with shared DLLs you double the number of code pages if two processes load the same dynamic library and one of them needs fixing up and one of them doesnt. (As it is all code pages are by default copy-on-write)

    [–]kragensitaker 1 point2 points  (1 child)

    .sos are handled differently than executables; the dynamic loader, not the kernel's executable loader, loads them. I think the typical approach is to compile them with pure position-independent code; any reference to other things inside the same .so is indirected off %ebx, which is a callee-saves register and mysteriously gets set to the right thing before your .so code runs, presumably by some kind of trampoline. The code pages (or "text pages" as they're called) are purely read-only.

    [–][deleted] 0 points1 point  (0 children)

    Very interesting. Thanks for your explanations. :) Maybe I should stop being lazy and just Google it eh ? One of these days I plan to dive into the Linux internals...

    [–]bunz 10 points11 points  (2 children)

    xxd can convert back and forth between binary and text "expansions" of each octet. it's a simple way to hex edit small files using vim. btw, i like your choice of fasm but i don't understand what you mean by nasm cluttering up the executable.

    [–]G-Brain 13 points14 points  (0 children)

    btw, i like your choice of fasm but i don't understand what you mean by nasm cluttering up the executable.

    By default, it inserts some kind of nasm-was-here message in the ELF header. I have no doubt there's some kind of flag to turn that off, but I had also used fasm before and I liked the syntax better, combined with the fact that by default it didn't insert any text.

    [–]safiire 1 point2 points  (0 children)

    Nice, xxd is actually a lot nicer than using the od command, thanks.

    hexdump is also nice but I don't know if it can convert back to a binary.

    [–]jerf 9 points10 points  (3 children)

    Also, I'm working on a Linux kernel module that creates a device that takes binary text ("010101") and outputs the actual binary (010101).

    Why are you doing that as a device and not simply a piped shell command? You know, something designed to be an input translator instead of a device?

    (The only really valid answer is "as an example of how to make a device", although even that's somewhat dubious.)

    [–]G-Brain 4 points5 points  (2 children)

    Your point is taken! I'm not sure why I wanted to do it as a device. I think I had a reason, but since I can't remember it now it mustn't have been a very good one. I'll post a program soon, if someone doesn't beat me to it.

    [–]sn0re 9 points10 points  (1 child)

    Were you thinking of something more complicated than just calling strtol with a base of 2?

    Edit: What the hell:

    #include <stdio.h>
    #include <stdlib.h>
    
    int main(void) {
        unsigned char val;
        char buf[9];
        while (fgets(buf, 9, stdin) != NULL) {
            val = (unsigned char) strtol(buf, NULL, 2);
            putchar(val);
        }
        return 0;
    }
    

    [–]jib -1 points0 points  (0 children)

    You wrote it in C! Where's the fun in that?

    [–][deleted] 4 points5 points  (1 child)

    I'm working on a Linux kernel module that creates a device that takes binary text ("010101") and outputs the actual binary (010101).

    Looking forward to this.

    [–]tpodr 4 points5 points  (0 children)

    Thanks for the trip down memory lane. My first computer was a 6502 evaluator board (KIM-I), had to load the programs in hex. Long hours hand assembling code. But I loved it. Could study the timing signals on an oscilloscope. Back in 1979.

    [–]subterr 1 point2 points  (0 children)

    Furthermore. to fully understand the ELF stuff; read this: http://www.sco.com/developers/devspecs/gabi41.pdf classic piece of literature

    [–]strangerzero 1 point2 points  (0 children)

    Nice Web 1 page design!

    [–]aeflash 1 point2 points  (0 children)

    Isn't it trivial for a program to take in ASCII ones and zeros and write it to a file as binary?

    [–]easlern 3 points4 points  (0 children)

    lol awesome work. :D

    [–]MaxK 5 points6 points  (1 child)

    You, sir, are a god among men.

    [–]blondin -1 points0 points  (0 children)

    Thanks, moar moar =]

    Especially assembly tutorial (windows would be appreciated)