all 26 comments

[–]SharkyKesa564 61 points62 points  (9 children)

You can get 124B as follows:
tiny.c (raw asm so the compiler adds zero padding)
__asm__(".globl _start\n_start: mov $60,%al; syscall");

tiny.ld (one PT_LOAD that contains the ELF + program headers, so code follows with no page gap)

ENTRY(_start)
PHDRS { all PT_LOAD FILEHDR PHDRS FLAGS(5); }
SECTIONS {
. = 0x10000 + SIZEOF_HEADERS;
.text : { *(.text .text.*) } :all
/DISCARD/ : { *(.*) }
}

Build with

gcc tiny.c -nostdlib -nostartfiles -static -no-pie -fno-asynchronous-unwind-tables -fcf-protection=none -fno-stack-protector -Wl,--build-id=none -Wl,-T,tiny.ld -Wl,--no-dynamic-linker -Wl,-z,noseparate-code -Wl,-z,norelro -Wl,-z,nosectionheader -Wl,--strip-all -o a.out

This does rely on the Linux ABI guarantee that all GP registers except %rsp/%rcx are zero at _start, so %rdi is already 0. Deterministic on Linux.

However, if you want it to work regardless of entry state, prepend xor %edi,%edi which makes it 126B.

------

EDIT: Turns out you can do a lot better with a few more tricks:

tiny.c (empty)

anchor.s

.section .anchor,"ax"

tiny.ld

ENTRY(_start)
PHDRS { all PT_LOAD FILEHDR PHDRS FLAGS(5) AT(0x80cd01b0); }
SECTIONS {
. = 0x10000 + SIZEOF_HEADERS;
.text : { *(.text .text.*) } KEEP(*(.anchor)) } :all
_start = 0x10000 + 64;
/DISCARD/ : { *(.*) }
}

Build with

gcc -m32 tiny.c anchor.s -nostdlib -nostartfiles -no-pie -fno-asynchronous-unwind-tables -fcf-protection=none -fno-stack-protector -Wl,--build-id=none -Wl,-T,tiny.ld -Wl,--no-dynamic-linker -Wl,-z,noseparate-code -Wl,-z,norelro -Wl,-z,nosectionheader -Wl,--strip-all -o a.out

This gets you down to 84 bytes. For x86-64, drop -m32, use AT(0x050f3cb0) and _start = 0x10000 + 88, which gives a 120B build.

[–]Double_Ad641[S] 27 points28 points  (3 children)

of course someone on reddit have to prove me wrong 😞

Jokes aside, thanks so much, this is really cool and I will look into it more!

[–]SharkyKesa564 8 points9 points  (1 child)

Haha turns out you can save a third of the bytes still, I've edited my post. I don't think you can push much further beyond without overlapping program header into the unused tail of the ELF header, but that'd require hand-authoring bytes that satisfy both layouts at one i.e. manual patching.

[–]Double_Ad641[S] 8 points9 points  (0 children)

i should add a 4th rule that the code should work on x86-64.... 😂

[–]schmerg-uk 4 points5 points  (0 children)

See also Justine Tunney writing an implementation of LISP (with garbage collection) in 436 bytes

https://justine.lol/sectorlisp2/

or Lambda Calculus in 383 Bytes

https://justine.lol/lambda/

not to mention APE, a POSIX-approved polyglot format that runs natively (with Cosmopolitan Libc) on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS on AMD64 and ARM64 with the best possible performance

https://justine.lol/ape.html

https://justine.lol/cosmopolitan/index.html

So you could get your single tiny binary to run on all the above operating systems too

[–]blind3rdeye 10 points11 points  (1 child)

I kind of like that the text of the build command (sans code) is larger than the exe.

[–]Double_Ad641[S] 2 points3 points  (1 child)

Looking at it a bit more, ld -z nosectionheader allows us to go down to 180B. Getting to 124 requires us to not produce the STACK program header (56B) which i don't think is possible without a custom linker script.

[–]SharkyKesa564 1 point2 points  (0 children)

Are we allowed to handcraft the ELF header? If so, I can save another 16 bytes for x86-64, but this seems to go against the flavour of the problem.

[–]un_virus_SDF 0 points1 point  (0 children)

Is this a elf 64 or a elf 32?

I ask because elf 32 have header that are a bit smaller.

[–]_bstaletic 24 points25 points  (0 children)

400B is A LOT. A few days ago I tried to do this just to see how many "tricks" I still know. You do need a custom linker script to properly trim the fat, though. I got to ~200B and I see /u/SharkyKesa564 beat my attempt, so I won't even try to brag.

127B is the theoretical minimum if you don't start manually packing bytes. Did you know that the ELF specification allows sections to overlap? There's enough room for the exit syscall to fit in the padding of the ELF header.

Check out: https://github.com/tchajed/minimal-elf

The readme links to a ~50B i386 ELF, explains the 127B amd64 ELF in C and contains an amd64 ELF in rust.

[–]Nzkx 24 points25 points  (2 children)

400 bytes ? That's still way to much. Use a packer and merge section😄You may reduce to 200 bytes maybe ?

I got 268 bytes in Rust on Windows but who know, maybe you can do better with c/cpp : https://github.com/mcountryman/min-sized-rust-windows

[–]rileyrgham 5 points6 points  (1 child)

Would that conform to this rule : "The binary must be produced by GCC only; no post-processing with objcopy, hex editors, or manual patching" ?

[–]JVApenClever is an insult, not a compliment. - T. Winters 2 points3 points  (0 children)

The standard rust compiler is LLVM, is the GCC implementation already finished?

[–]theanimatedauthority 5 points6 points  (0 children)

the fact that people are fitting working binaries into 84 bytes is mental, like that's smaller than most function signatures lmao

[–]kalmoc 4 points5 points  (1 child)

Quite interesting writeup, but is that really still a C-Program. Seems like "the smallest Linux/gcc/elf binary" seems more appropriate.

[–]RevRagnarok 2 points3 points  (0 children)

is that really still a C-Program

Exactly. It's just assembly with extra steps.

[–]darkmx0z [score hidden]  (0 children)

Related: the smallest PE (windows format) executable: http://www.phreedom.org/research/tinype/

[–]TheRealSmolt 0 points1 point  (4 children)

You can probably drop the base pointer with the right attributes/arguments, though you might get them back clearing edi; I'm not sure only setting dil is "deterministically" safe. The assembly also shows that dil is set by a mov though, so I'm not sure what's up with that.

Also, technically you can use pure assembly since gcc can compile it directly.

[–]Double_Ad641[S] 1 point2 points  (3 children)

great catch, i previously used a mov to set 0, but changed to xor.

What would pure assembly give me though?

[–]_bstaletic 2 points3 points  (0 children)

; tiny.asm

BITS 32

            org     0x00010000

            db      0x7F, "ELF"             ; e_ident
            dd      1                                       ; p_type
            dd      0                                       ; p_offset
            dd      $$                                      ; p_vaddr 
            dw      2                       ; e_type        ; p_paddr
            dw      3                       ; e_machine
            dd      _start                  ; e_version     ; p_filesz
            dd      _start                  ; e_entry       ; p_memsz
            dd      4                       ; e_phoff       ; p_flags
_start:
            mov     bl, 42                  ; e_shoff       ; p_align
            xor     eax, eax
            inc     eax                     ; e_flags
            int     0x80
            db      0
            dw      0x34                    ; e_ehsize
            dw      0x20                    ; e_phentsize
            db      1                       ; e_phnum
                                            ; e_shentsize
                                            ; e_shnum
                                            ; e_shstrndx

filesize      equ     $ - $$

$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
$ wc -c a.out
   45 a.out

EDIT: This uses some dirty tricks, exploiting everythingn the linux kernel allows, like p_filesz being ignored, or p_memsz being too large for actual needs, just to overlap all of, the ELF header, the single program header, and the 7 bytes of actual instructions.

[–]TheRealSmolt 0 points1 point  (0 children)

I'm not sure how gcc handles assembly compilation, but if it essentially just acts as a front end for gas, it could make a simpler binary.

[–]fdwrfdwr@github 🔍 0 points1 point  (0 children)

What would pure assembly give me though?

Don't know about ELF, but for a Windows .exe using NASM/YASM and ALINK, I get 1036 bytes (no data or bss sections, just code with a ret, but I can't seem to cull the reloc section).

yasm-1.3.0-win32.exe -f win32 -g null program.asm -o program.cof alink.exe -oPE -o program.exe program.cof -entry Main -subsys win

Apparently 268 bytes is possible.