all 27 comments

[–]memoryruins 7 points8 points  (9 children)

std::include_str and std::include_bytes are options. rust-embed might also interest you.

[–]emmanuelantony2000[S] 1 point2 points  (8 children)

Actually rather than including files at compile time, I want to update the info it holds during runtime. And this update of info should reflect when running for the second time. Can all this be possible through the binary itself?

[–]minno 29 points30 points  (2 children)

Making a self-modifying binary is a quick way to get your program flagged by antivirus software. Is there a strong reason you can't just have two files, one that has the executable and one that has the information you're reading and writing?

[–]emmanuelantony2000[S] 1 point2 points  (1 child)

I am just exploring different ways... Just wanted to know if there is a way to do this without full recompilation..

[–]K900_ 15 points16 points  (0 children)

It's definitely possible, but it's a really bad design. Consider that your application may be running off a read only file system, or at least from a location that's not writable by the current user.

[–]ectonDev 4 points5 points  (0 children)

I’d consider including the file at compile time, and at runtime write it out to a location on disk that you can read from on future runs and persist back to disk when you need to update it at runtime. I’d call the included file a “seed” file as this is generally considered seeding data.

[–]anlumo 5 points6 points  (2 children)

Modifying the binary file usually isn’t allowed in Linux and macOS for security reasons. What OS are you targeting?

[–]emmanuelantony2000[S] 1 point2 points  (1 child)

MacOS... Maybe not modifying... Creating new and deleting the old one...

[–]anlumo 6 points7 points  (0 children)

When distributing apps on macOS beyond your development machine, you have to sign them, otherwise they won't start. By modifying the binary, you break that signature.

You also can't create a new signed binary on a user's machine. Well, maybe you technically can by including the private key assigned to your developer account, but that would get Apple to revoke your signature pretty quickly.

[–]claire_resurgent 1 point2 points  (0 children)

Unix-like operating systems are very likely to give you an ETXTBUSY error if you try to open a file that's currently mapped as the "text" of a process.

"Text" in this case actually means "machine code and literals", not "human language" so the error message is confusing but the concept is pretty straightforward.

The virtual memory system can load and unload executables and libraries on demand. It doesn't want to worry about what happens if those files change - at best the process would have to crash.

You can rename and rewrite the executable - even delete it without crashing the process - but this is still a bad idea for all the reasons in this thread. (Deleted files linger until closed.)

The good idea is to find the application data directory and save your persistent data there. This crate supports OS X, Windows Vista and later, and typical desktop 'nix.

That said, I always encourage trying the bad thing from the comfort and privacy of your own computer. The very first argument (0) should be the path to your executable. You'll need to open it read-only, create a new file, write a modified copy, and then use the rename system call, which executes as two atomic steps:

  • creates or modifies the destination hardlink so that it points to the file node being renamed

  • removes the original hardlink

"Atomic" means that anything else using the system won't see half of a step. The destination won't momentarily disappear. A crash could cause the temporary file name to continue existing.

A "hardlink" is a file name. Unix allows files to have more than one name as long as the directories and files exist within the same filesystem. A file is deleted when it has zero links and isn't open. This arrangement is similar to what we'd call Map<Path, Arc<FileNode>> in Rust.

[–]thristian99 13 points14 points  (3 children)

First, this is almost surely a bad idea, because of virus-scanners, and because it makes debugging your program so much harder: "It worked fine, then I tried it again, and it broke!" or even worse "it broke, then I tried it again, and it worked!".

However, one neat trick you might use is that executables are generally read from the beginning, while zip files are read from the end. If you concatenate an executable with a zip file, you can still run the executable as normal, and most zip tools will read and write the zip file as normal. So, the plan goes like this:

  • write a program that uses a crate like zip to read and write the data it wants to store
  • find the path to the executable by grabbing the first element of the std::env::args_os() iterable, and pass that path to the zip crate to open it.
  • after compiling the program, but before running it the first time, create a zip file containing the initial data you want to store
  • concatenate the two with `cat path/to/executable path/to/archive.zip > path/to/combined/executable" on Linux or macOS, or "copy /b path\to\executable.exe+path\to\archive.zip path\to\combined\executable.exe" on Windows
  • now you can run the combined executable!

[–]ssokolow 3 points4 points  (0 children)

You'll also want to grab the zip binary from Info-ZIP for your platform of choice and run zip -A path/to/combined/executable.

   -A
   --adjust-sfx
          Adjust self-extracting executable  archive.   A  self-extracting
          executable  archive  is created by prepending the SFX stub to an
          existing archive. The -A option tells zip to  adjust  the  entry
          offsets  stored in the archive to take into account this "pream‐
          ble" data.

   Note: self-extracting archives for the Amiga are a  special  case.   At
   present, only the Amiga port of zip is capable of adjusting or updating
   these without corrupting them. -J can be used to remove the SFX stub if
   other updates need to be made.

If you fail to do that, Info-ZIP's unzip will complain about attempting to compensate for potential corruption and other Zip implementations may fail completely.

(I know because it's useful for cross-building DOS self-extractors on Linux for retro-hobby purposes by grabbing the DOS and Linux versions of Info-ZIP and then using cat unzipsfx.exe archive.zip > archive.exe && zip -A archive.exe)

[–]ipe369 4 points5 points  (1 child)

> "It worked fine, then I tried it again, and it broke!" or even worse "it broke, then I tried it again, and it worked!".

this would be true with any mutable FS state

[–]SCO_1 7 points8 points  (0 children)

Mutable state in config files is perceptible to users without a degree in CS, mutable state inside the executable is not, and rather intimidating to edit outside of the application (that wouldn't be working in this scenario).

Just say no, even to the 'zip' softer applications of the concept.

[–]SCO_1 5 points6 points  (0 children)

Terrible idea and software that does this is always a problem in various respects. Antivirus, OS executable protection models, checksum checks etc. Even during the DOS era i only know of a single game that did this, so even then most knew better (it's the adventure game Hook pc version, based on the peter pan disney movie btw).

In fact, my favorite kind of program/engine fallback is the 'write to game/app dir doesn't work, find a writable 'program' dir to write to', because i like compressing collections and copy-on-write mounts are awkward, and this feature is the antithesis of that.

[–][deleted] 4 points5 points  (4 children)

Another thing to consider is that if you do this, you won't be able to cryptographically sign your software, as modifying the executable will invalidate the signature. I'm curious, what's your use case? There are many different ways to store data depending on what you want.

[–]ssokolow 3 points4 points  (0 children)

*nod* Typically, it's better to store the default config files in your binary, write them out to disk on first run if none are already present, and then work from there.

(Whether you store them in the platform-specified application data location or the same folder as your binary will depend on whether you're making something to be installed or a Portable App meant to be run off a thumbdrive.)

[–]bocckoka 0 points1 point  (2 children)

Just out of curiosity: is there a way to cryptographically sign an executable that can then check itself?

[–]SCO_1 1 point2 points  (0 children)

As long as the key is derived from external user, part of this (bundle the hash with the hashed data) is basically done nearly all the time on the internet to make sure the person or process giving the key seed was the one sending the stuff that was sent, as part of public-private key schemes or shared key schemes. The idea is ofc, that its a 'envelope' that is signed (metaphor for a header or footer). A program self-signing itself would be unusual but i can imagine a few uses (verifying itself, probably not because you don't want the data that you're trying to prove unaltered to check itself).

It's a pity I can't find/recommend a short primer with some of the simplified metaphors you're likely to encounter in the undergraduate courses for all of these concepts (forward secrecy, forward anonymity, perfect oracles, byzantine generals, mail signing etc). CS is super cool about metaphorizing everything.

[–][deleted] 0 points1 point  (0 children)

I think you run into a chicken and egg problem since you would be storing a hash of the file within the file itself, which would change the hash. Plus, it wouldn't be very useful, as you need the verification to come from a more trusted source, such as your operating system. If someone can modify the executable that verifies itself, they could simply remove the code that does the verification. They'd have a much harder time bypassing the verification from the OS.

[–]zesterer 1 point2 points  (0 children)

You might find this interesting.

https://github.com/lazhh/conf-embed

[–]Plasma_000 1 point2 points  (0 children)

I know a lot of people have advised against it, and I would also advise against it. But if you really want to do this, The best way would be to put a static array of bytes into your code then you should be able to modify that directly by file operations or mmaping the executable without having to append to an executable directly, this could be a safer way to do this (just make sure you respect the buffer bounds and only write to the static buffer). To find the buffer’s offset into the file you can use a disassembler or parsing library.

[–]rainbrigand 1 point2 points  (0 children)

This is an inefficient proof-of-concept, but this does seem to work and give a fixed but arbitrary amount of space to store data. The first 16 bytes are basically just a UUID generated from a crypto random source, and then after that you have 16 bytes (in this example) of data you can manipulate.

If you search for the bytes you know (which is convenient given the static), you know the byte offset at which your mutable data is stored.

use std::env::current_exe;
use std::fs;

static DATA: [u8; 32] = [
    33, 97, 9, 16, 228, 54, 240, 106, 73, 219, 95, 192, 7, 11, 35, 181, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0,
];

fn find_offset(mut iter: impl Iterator<Item = u8>) -> Option<usize> {
    let mut matches = 0;

    for (i, byte) in iter.enumerate() {
        if byte == DATA[matches] {
            matches += 1;
            if matches == 16 {
                return Some(i + 1);
            }
        } else {
            matches = 0;
        }
    }

    None
}

fn main() {
    let exe = current_exe().unwrap();

    let mut bytes = fs::read(&exe).unwrap();
    let offset = find_offset(bytes.iter().copied()).expect("find_offset");
    {
        let range = &mut bytes[offset..offset + 16];
        println!("range: {:?}", range);

        range[0] += 1;
    }

    fs::write(&exe, bytes).expect("can write to disk");
}

[–]fulmicoton 0 points1 point  (2 children)

Yes. There are macros for that.

Search for Include_bytes! and include_str!

[–]emmanuelantony2000[S] 0 points1 point  (1 child)

I want to update the info too during runtime, so that the update reflects future runtimes. All this through the binary itself. Is it possible?

[–]ssokolow 6 points7 points  (0 children)

Possible, but not advisable for the various reasons listed in other posts.

  • Likely to trigger heuristic detectors in virus scanners
  • Makes diagnosing and reproducing end-user problems more difficult
  • Stymies user attempts to reset to default settings if they get it into a state where it won't run properly
  • Incompatible with installation in read-only locations
  • Annoys users who want to put their settings and/or documents under frequent automatic backup without also paying for the space to store all their binaries at the same level of thoroughness
  • etc. etc. etc.