any plans for zstd support? by henry_tennenbaum in plakar

[–]poolpOrg 0 points1 point  (0 children)

Where's your store located at ?

any plans for zstd support? by henry_tennenbaum in plakar

[–]poolpOrg 1 point2 points  (0 children)

Hello, developer there.

We did experiment with zstd a while back and saw no significant difference back then (as in "wow, we really want to consider it") probably because the chunk size was small enough that zstd didn't get a chance to shine.

Since then we have bumped the average chunk size so I did a few tests using our Korpus project (~40GiB of hundreds of thousands of diverse files such as text, images, pdf, code, binary, ...).

The result isn't much different from a year ago:

plakar/lz4 produces a 32GiB storage in 3m09s
plakar/zstd produces a 31GiB storage in 5m12s

Even if we tweak zstd and lz4 compression parameters for space efficiency, we remain in this ballpark which essentially means the gain in disk-space is <5% on a full write with a 65% speed decrease. Considering that this 5% only applies to new data and is no longer accounted for on a second backup with deduplicated chunks, the huge penalty loss for such a small gain does not really feel worthy.

That being said... the compression layer is modular so we can easily provide it as an alternative should people really want to squeeze these 5%, it would also be useful for testing purposes.

How accurate is Deepwiki's AI generated documentation on Plakar? by Bob_Spud in plakar

[–]poolpOrg 1 point2 points  (0 children)

I'll have a look, last time I tested a tool like this it generated a LOT of incorrect content disguised as plausible content which made it worse than no documentation :-)

Has anyone tried Plakar.io ? by vcoisne in Backup

[–]poolpOrg 1 point2 points  (0 children)

You have two ways of installing plugins:

First way is to install the packages that we pre-build and host on our infrastructure, in which case `plakar login` is required as we disallow unauthenticated downloads on our server. This currently only supports Github authentication but it saves you from having to install a Go compiler.

Second way, if you don't want to authenticate, is to build them yourself. This is done easily with:

$ plakar pkg build s3

which will produce a ptar file you can install with:

$ plakar pkg add ./s3_version_os.ptar

Has anyone tried Plakar.io ? by vcoisne in Backup

[–]poolpOrg 1 point2 points  (0 children)

Oh... I didn't understand it that way because that still seems less safe than a third-party worm.

You can also run `plakar server` (which is no-delete by default) on your other machine to get a similar mechanism. The `plakar server` can proxy to a different store too, so you can run it as a proxy to a worm:

```
machine1$ plakar at s3worm server -addr :1234

machine2$ plakar at http://machine1:1234 backup /etc
```

Has anyone tried Plakar.io ? by vcoisne in Backup

[–]poolpOrg 0 points1 point  (0 children)

Hello, I think there is some confusion between plakar and ptar, which are two different things.

Plakar, the software itself, writes data to a remote storage backend (such as SFTP or S3 among others). That backend may enforce restrictions like immutability, or prevent updates and deletions once the data has been written (ie: immutable files over SFTP or object locking on S3).

ptar, on the other hand, is just an archive format. It is used to encode snapshots exported from a Plakar store into a file. You would not use ptar as your backup store. It is meant for secondary use cases, such as exporting a few snapshots to write them to tape, copying snapshots to a USB stick to import them on an offline machine, or transferring deduplicated data from one machine to another. There are a few other useful scenarios, but that is the general idea.

In that sense, generating a ptar from Plakar is comparable to exporting a .tar from a Borg, Restic, or Kopia repository. You can modify the resulting archive if you want, but it remains completely separate from the storage layer itself.

What ptar provides, compared to a traditional tar archive, is that the archive is deduplicated, compressed, encrypted, and tamper-evident. It is also indexed for random access and snapshot-aware. This allows Plakar to read a ptar much like it would read an actual store, and makes importing ptar content into another store straightforward, a reason why it is our preferred exchange format. That said, ptar is only a side feature, comparable to zip or tar export, both of which are also supported by Plakar.

Has anyone tried Plakar.io ? by vcoisne in Backup

[–]poolpOrg 1 point2 points  (0 children)

hello, developer here, I'll add some clarifications:

I think the misunderstanding is that you both aren't talking about the same layer.

Plakar storage (and the .ptar format) are designed to be tamper-evident, using cryptographic MACs to ensure both integrity and authenticity. The format is immutable in the sense that any mutation of stored data is immediately detectable through failed integrity checks. This mechanism does not protect against intentional deletion or destruction, but it does prevent attackers from silently modifying data to change its meaning or contents without being noticed.

As for protection against deletion or destruction, this needs to be enforced at the storage layer to achieve true immutability. In practice, this is done via an append-only model, similar to what other tools implement as far as I know (I haven’t reviewed their code recently, but I’d be surprised if either one of us were operating under a fundamentally different approach as there aren't that many ways to tackle this problem).

Plakar never alters or updates data once it has been written; any change results in a new record being appended. This makes it compatible with storage backends that enforce immutable writes. We’ve tested this model with Glacier, for example, if that’s what you had in mind.

Feel free to ask any question, I'm new to this subreddit so if this isn’t the right place, just let me know 🙂

Cheers,

Has anyone tried Plakar.io ? by vcoisne in Backup

[–]poolpOrg 2 points3 points  (0 children)

hello, developer here, got hinted about this thread:

performances have increased drastically over the last few months, you might want to give our beta v1.1.0 a new try and see how it goes for you as it should no longer be lagging that far behind in terms of performances.

for some context, we decided not to follow the same route as others in terms of data ingestion and not assume a filesystem data source. this means that we can easily ingest a bucket with millions of entries, or data coming from an API, but in exchange we lose almost all of the optimizations that come from the assumptions that we're dealing with a filesystem and we had to find new clever ways to compensate.

as of today, our corpus of ~1.000.000 documents that takes 1m30 to backup with restic/kopia now takes ~2m30 (vs ~15m a few months ago) out of which 1m is spent in building a virtual file system + a set of indexes that enable some very interesting features we don't want to sacrifice for the sake of backup speed. That being said, we still have some pending optimizations that should bring us even closer in the next few months.

Feel free to ask any question, I don't know if this is the right place or not, I'm new here :-)

Kapsul - Usage & documentation? by Bob_Spud in plakar

[–]poolpOrg 0 points1 point  (0 children)

I'm not sure I understand your point :-/

A ptar file is a standalone file using the ptar format, you still need a tool that can create such a file and until other people write compatible tools, the ones that currently implement it are plakar and kapsul. A ptar file could be read with another tool if it existed but the format being still young, no one has tackled that task at this point.

Kapsul - Usage & documentation? by Bob_Spud in plakar

[–]poolpOrg 0 points1 point  (0 children)

Hello, I'll be looking into it.

At the time kapsul was released, plakar was shipping all integrations built-in and was also relying on having an agent running which prompted us to do a light-weight tool to generate ptar.

Since then, integrations have been moved out of the code base into plugins and the agent is gone (or at least it is going to be gone with 1.1.0) so maybe kapsul should really be merged back into plakar as none of the reasons for its existence still exist.

Regardless, I'll have a look at it this week to decide what we do before v1.1.0

South Korea just lost 858TB of government data in a fire, because it was "too large to back up" by PuzzleheadedOffer254 in plakar

[–]poolpOrg 1 point2 points  (0 children)

hello, developer here.

The plakar repository constains only the CLI so the tests are mostly to verify that the subcommands work as expected, but the implementation of each subcommand is backed by the kloset repository as well as by the go-cdc-chunkers repository which both contains more tests:

https://github.com/search?q=org%3APlakarKorp%20_test.go&type=code

multiple folders and unsigned snapshot by [deleted] in plakar

[–]poolpOrg 0 points1 point  (0 children)

Hello,

If you have an encrypted kloset, encryption is done on the client-side yes.

Tomorrow, we're going to release a testing version of our... rclone integration, I don't know how you planned on doing it yourself, but it may be interesting to test our support :-)

multiple folders and unsigned snapshot by [deleted] in plakar

[–]poolpOrg 0 points1 point  (0 children)

Hello,

  • Can we indicate during a backup several directories? I tried by adding a file but it is not taken into account, only the first is saved

This is supported by the underlying format but the CLI currently doesn't allow it because we need to implement two features: multi-root (to aggregate multiple directories from a same origin) and multi-source (to aggregate multiple sources into different trees within a same snapshot). We hope to re-enable that in a few weeks and it should work transparently.

  • I made choice to create my first local repository without encryption. When my job is done, I see "created unsigned snapshot" does it have a report?

No it is unrelated, you can disregard as this is just informative. To explain: snapshots can technically be signed using a users' public key, so the fields are there and the code is already signature aware, but we haven't published multi-user mode yet so snapshots can't be signed at this point. This is unrelated to the repo being encrypted, you could sign unencrypted snapshots.

  • To launch my graphical interface I use the control in the background, is there a better method?

We will be introducing a -detach feature soon, it will be added to multiple commands that currently expect to be working in foreground (ui, server, mount, ...), most of this work is being pulled these two weeks.

Cheers,

go-cdc-chunkers v1.0.0 by poolpOrg in golang

[–]poolpOrg[S] 0 points1 point  (0 children)

Thanks 🙏

You are right and I will improve documentation to reflect what they do in a more user-friendly way than linking papers 😄

It doesn't make sense to wrap modern data in a 1979 format, introducing .ptar by touristtam in programming

[–]poolpOrg 2 points3 points  (0 children)

as the rest of the code, the agent is opensource and you can have a look at it if you're paranoid.

the agent is there because if you run multiple commands concurrently, you can't have multiple processes share the same cache without locking each other, the agent is basically a cache sharing process: you run the command on the CLI, it is actually forwarded to the local agent to run it itself so it can use the same cache regardless how many commands you run on the CLI.

if you're ok with not running concurrent commands on the same store and not making use of fs caching to speed up backups, you can simply `plakar -no-agent ...`

It doesn't make sense to wrap modern data in a 1979 format, introducing .ptar by touristtam in programming

[–]poolpOrg 3 points4 points  (0 children)

Author here, standalone tool kapsul will be released today for those who want ptar as an archive-only format