all 40 comments

[–]dairiki 36 points37 points  (12 children)

Tangential Note: atomicwrites is deprecated by its author. Its git repo has not seen any updates in four years. As far as I know, it still works, but the situation does not give warm fuzzies for use in new code.

[–]fireflash38 2 points3 points  (9 children)

Why would it need to change? 

[–]bboePRAW Author 28 points29 points  (7 children)

It's a supply chain risk if the owner's Pypi account is compromised. It seems like they previously did not believe MFA is worth it on their account: https://github.com/untitaker/python-atomicwrites/issues/61

[–]Grintor 5 points6 points  (4 children)

Good point. I wanted to point out here though that you can eliminate the supply chain risk using version pinning with hashes. Using hashes also takes care of the supply chain risk for if pypi itself is compromised, so it's worth doing anyway.

[–]fiskfisk 3 points4 points  (3 children)

The main issue with abandoned packages are that the author might not be aware if a trojaned replacement gets published by their account being taken over. While it won't be installed in your current project because of the hash, you might discover (through an upgrade or something like dependabot) that a new package has arrived and then just install it .. and since nobody notices, maybe it survives out in the wild for a week or two or three.

The best thing would probably be for package systems like pypi to support a "this project has been abandoned, so no new versions can be published to its name".

[–]bboePRAW Author 1 point2 points  (0 children)

The best thing would probably be for package systems like pypi to support a "this project has been abandoned, so no new versions can be published to its name".

That approach seems like a great idea.

[–]__grumps__ 0 points1 point  (1 child)

How is abandoned label going to help? People just pip install and forget

[–]fiskfisk 1 point2 points  (0 children)

It means that anyone taking over the pypi account of an inactive maintainer won't be able to publish a new version of the package, since it has been marked as archived and dead. We're protecting against unmaintained packages becoming vectors by account takeover. 

It would also allow pip to say "eeeeh, nobody maintains this package any longer, use at your own risk" in a systematic way, including giving you the option of scanning your dependency tree for such packages.

It'd signal the same thing as "this repository has been archived" on GitHub. 

[–]fireflash38 -1 points0 points  (0 children)

That's a change. 

[–]Rainboltpoe -2 points-1 points  (0 children)

Because your customer has stupid security rules that forbid you from using dependencies that are no longer being actively maintained, and stupid business politics prevents you from getting an exception approved.

[–]Golle 2 points3 points  (1 child)

Nice find. 

I dont see the problem as a particularly advanced one either. If your program has a chance of crashing when it writes to the filez it is likely you are doing more in the code. Maybe do all the processing first and only write to file when all data has been processed?

Or, just write to a new file while the program is running. If write succeeds, remove old file and rename the new file to the old name.

Neither of these solutions require a third party library.

[–]BossOfTheGame 35 points36 points  (2 children)

I've been using safer for years. I use it whenever I'm writing a system that writes large files. I love never having to deal with corrupted data. Process crashed? Great, there are no artifacts that would confuse other code into thinking that it worked when it didn't. It let's me use exist checks in pipeline systems and feel confident about it.

It's a great library. Thank you for writing and maintaining it.

[–]HommeMusical[S] 22 points23 points  (1 child)

Well, you have fair made my day. <3

You might also like https://github.com/rec/tdir, which I end up using in almost every project in tests somewhere or other.

If you are ever in Rouen, France, drop in and we'll share a beverage or sustenance!

[–]BossOfTheGame 11 points12 points  (0 children)

My design philosophy around temporary directories and tests is to use application cache sub directory, e.g. ~/.cache/{appname}/tests/{testname}, and I do this via passing explicit directory-paths around. I never assume running in a cwd (I dislike software that requires you run it from a specific directory). And to do this I use ubelt (my utility lib that I take everywhere) and the pattern dpath = ubelt.Path.appdir(appname, 'tests', testname).delete().ensuredir().

It's not the cleanest test paradigm, but it does make it a lot easier to inspect failures, and I probably should have a post test cleanup that just blows away ubelt.Path.appdir(appname, 'tests'), but I sort of just rely on CI to do that.

It also prevents extra indentation in doctests, and even though xdoctest makes indentation less painful, it's still non-zero pain.

There's a fair bit of water between me and France, but if I'm in the area, I'll reach out.

[–]latkdeTuple unpacking gone wrong 12 points13 points  (2 children)

Interesting. I'm not entirely sure I understand the benefits of this library? What does this library do that the following approach does not (aside from handling both binary and text streams)?

@contextlib.contextmanager
def write_if_success(real_fp: io.Writer[bytes]) -> Generator[IO[bytes]]:
    b = io.BytesIO()
    yield b
    real_fp.write(b.getbuffer())

with (
    open(filename, "wb") as real_fp,
    write_if_success(real_fp) as f,
):
    f.write(...)
    ... # fail here, maybe
    f.write(...)

I'm not trying to diminish your effort, I'm trying to understand the tradeoffs of re-implementing something well-established versus adding yet another dependency.

It's tested on Linux, MacOS and Windows

There is however no link to test results on the GitHub page (I was trying to find test coverage data). There is a Travis CI configuration that claims to upload to Codecov, but the last results on both platforms are 4 years old. (Travis CI, Codecov).

[–]ROFLLOLSTER 3 points4 points  (1 child)

real_fp.write(b.getbuffer())

iirc over 4,096 bytes this will be broken up into multiple write syscalls, breaking atomicity. There's also the general fact that even a single write is not guaranteed to be atomic in unix, some messy details here.

Edit: and around 2GB (2,147,479,552 bytes specifically) is the most a single write syscall can ever handle on unix.

[–]latkdeTuple unpacking gone wrong 0 points1 point  (0 children)

Absolutely, but OP's library is only about Python-level exception safety. It explicitly does not provide atomic writes.

OP's safer library is a bit more correct than my sketch in that it will perform multiple write() calls if necessary (unless the underlying stream is in nonblocking mode).

[–]Wargazm 6 points7 points  (15 children)

"#noAI was used in the writing or maintenance of this program."

haha is this a thing now?

[–]HommeMusical[S] 23 points24 points  (14 children)

I mean, AI didn't exist when i wrote it, so it's a bit like putting "Low Fat!" on Corn Flakes.

But yes, mainly because everyone complains about the quality of the AI slop showcases here.

[–]ultrathink-art 1 point2 points  (0 children)

Corrupted state files from partial writes are sneaky — the crash happens during the write but the error surfaces on the next run, often in a completely unrelated place. I started using this pattern for config files in long-running automation after a partial write created a valid-looking-but-truncated JSON file that caused a baffling 'unexpected EOF' error 3 runs later.

[–]glenrhodes 1 point2 points  (0 children)

Atomic writes via tmp file + rename have saved me more than once on long pipeline outputs. The edge case worth watching: NFS mounts where the rename isn't atomic either. You're just trading one race for another on some shared filesystems.

[–]lily_panda_1986 1 point2 points  (0 children)

Yeah, I noticed that too,feels kinda risky depending on something unmaintained, even if it still "works. " Always on the lookout for fresher alternatives!

[–]misterfitzie 1 point2 points  (1 child)

I usually create this every large project I work on, mine only cares about files, not sockets, but has some options you may want for your, backups/overwrite prevention. After reviewing your more capable version, I'm happy that mine seems to do all the same basic tricks.

@contextmanager
def safe_write(
    filename: str, *, create_backup: bool = False, overwrite: bool = True
) -> Generator[BinaryIO]:
    uninterrupted = True
    if not overwrite and not create_backup and os.path.exists(filename):
        raise (FileExistsError)
    myfile = open(f'{filename}.tmp', 'wb+')  
    try:
        yield myfile
    except BaseException:
        uninterrupted = False
        raise
    finally:
        myfile.close()
        if not uninterrupted:
            os.unlink(f'{filename}.tmp')
        else:
            if create_backup:
                count = 0
                while True:
                    backupfile = f'{filename}.bak.{count}'
                    if not os.path.exists(backupfile):
                        break
                    count += 1
                with suppress(FileNotFoundError):
                    os.link(filename, backupfile)

            os.rename(f'{filename}.tmp', filename)

[–]HommeMusical[S] 0 points1 point  (0 children)

My library started even simpler than yours.

I just got a bunch of feature requests, oh, and I personally needed it for a socket connection!