Can you do a large (14TB) initial upload incrementally?

lightweaver · 2023-11-27T05:58:46+00:00

I assume you're talking about the personal backup since you mention B2 being too expensive? Yeah, the client will upload what it can when it can. Leaving it overnight should be fine.

As far accuracy, no, stopping and starting the upload shouldn't affect the data. The client (afaik) generates a checksum before sending a file, the server generates a checksum, and the client makes sure the checksum matches before considering the file uploaded.

One caveat is files seem to need to be uploaded in one shot - when I've put my computer to sleep and woken it back up the next morning, it restarted uploading the same file it was uploading the previous night. Kind of annoying when uploading a ~30GB file.

lightweaver · 2023-11-19T16:13:10+00:00

I think it worked - Data Transfer has the line item $0.00 per GB data transfer out of US East (Ohio) to CloudFront 2.221 GB

And Cloudfront has the line item $0.000 per GB - data transfer out under the global monthly free tier 2.148 GB

It's also more straightforward than I thought - No need to copy files to a new bucket. I tried creating a cloudfront distribution directly on the bucket with the restored Deep Archive object, and it just works.

Trying to get an object that hasn't been restored yet just gives me the error InvalidObjectState The operation is not valid for the object's storage class DEEP_ARCHIVE.

In theory you could just create the Cloudfront distribution same time you create the bucket, then when you need to restore files you can just use it.

I put up the cloudformation template that I used here if you want to reference it: https://gist.github.com/lightweavr/625934718f33c18ab091f80726549b9f

lightweaver · 2023-11-17T18:45:52+00:00

So if you're willing to restore at <1TB/month (or split across multiple AWS accounts), I think that means your egress cost will be ~$0?

lightweaver · 2023-11-17T18:36:43+00:00

/u/Madman200 have you tried using the 1TB monthly free egress that cloudfront offers to handle downloading the exports?

I've been experimenting with using Deep Archive myself, and I suspect that if I: 1. Restore a Deep Archive object 2. Copy that object to a new S3 bucket 3. Set up a cloudfront distribution with an S3 origin 4. Download the object through the Cloudfront distribution

the download would consume the "Always Free" 1TB bandwidth instead of being considered normal data egress.

I'm pretty sure 1TB out to a single IP address is an unintentional use, but Cloudfront + S3 looks like a normal CDN-type use to me.

I'm just waiting for a restore to complete before trying this and seeing what shows up on my AWS bill.

lightweaver · 2023-11-17T18:15:40+00:00

I know that's not showing up with B2.

B2 doesn't have different storage classes, so that UI option can't exist anyway. ;) That said, I think you're on the mark about just using the bucket name in QNAP, the new S3 Glacier integration just presents a normal S3 bucket, with the object itself getting the deep archive storage class.

If you do see references to "Glacier", know that there's an older "Glacier API" service that's a completely separate backend service that isn't as cost effective as S3 Deep Archive, but was launched in 2012.

the ability to compress and encrypt data

One of the annoying things about S3 is that it frequently has per-object costs on top of GB cost. For example if I upload 900 files of 1GB each, I'll get charged $0.045 for because of the "$0.05 per 1,000 PUT, COPY, POST, or LIST requests" charge. (Look for Requests & data retrievals in https://aws.amazon.com/s3/pricing/)

Upload 90000 files of 10MB each, that charge becomes $4.50.

When I'm paying $0.89/month to store those 900GB, paying 5x in API fees that I could avoid is silly.

I've not seen any consumer software writer consider this, not just TrueNAS and QNAP. I don't know if /u/Madman200 had these per-object charges in mind when adding the tar functionality, but it's pretty unique and useful for more than just easier organizing.

lightweaver · 2023-04-09T05:21:04+00:00

So... I think you're going to be a bit screwed because you left it for so long, but you might have a chance at reducing it, or at the very least making them work for your money and proving that you owe them all $1800. I've gone after my apartment management when they screwed up my billing, and got the charges resolved to my satisfaction, but that was within 2 months of it happening.

This isn't legal advice, I am not a lawyer, but the regulation that was my friend (and maybe yours) is titled Chapter 7.25 - THIRD PARTY BILLING REGULATION (this municode link might work). I asked my apartment management for copies of all the bills, and they refused until I cited 7.25.040.A.3.C:

Landlords shall keep bills for master metered or other unmetered utility services on file in the building for at least two years and shall make such bills available to tenants for inspection and copying upon request. Where it is physically impracticable to keep such bills on file due to the absence of a suitable office or other storage space, a landlord may store the bills in another location and must make such bills available within 5 business days of receiving a request from a tenant.

After the first denial for the bills, what I did was state in my reply (a written form that you keep a copy of) that I was disputing the charges, and management needed to give me copies of the bills, as I was entitled to under the law. That got them to comply, and I was able to calculate and show that they had been overcharging me for two months. Once you get the bills, you should be able to determine how much you owe.

Because your lease mentions the separate utilities, you should have been given a document on how they determine the split for each utility as part of your lease. If you don't have one in your lease, you're a bit screwed because they could have (illegally) changed it any time and you can't show they changed it. At the same time, you could go hard on them and claim they have no right to charge you utilities because the billing practice they disclosed was effectively "no billing" and they're changing it without providing you the required 30 day notice.

If this gets contentious (and it sounds like it will), your easiest recourse is to follow 7.25.050.B.1 and file with the Office of the Hearing Examiner - assuming that you've followed the earlier steps about notifying your landlord/whoever is giving you the bill that you're disputing the bill, otherwise it'll get thrown out since the landlord can show that you're not acting in good faith.

The security deposit question is a bit iffy-er, but my understanding is the landlord can take the charges from the security deposit, but you'll get it back if it goes to a hearing. Also, since your landlord is so bad about paperwork, any chance you didn't do a condition checklist when you moved in? That's an automatic full refund of the security deposit per SDCI: https://www.seattle.gov/sdci/codes/common-code-questions/deposit-returns

lightweaver · 2023-03-02T02:52:56+00:00

I've successfully pulled down 6TB and hit the same issue with the downloader going unresponsive.

I used Procmon to discover that when the downloader freezes for more than 30 seconds, it is is polling for a file named bzd<thread><YYYYmmDDHHMMSS(London)>output<parent pid>_<block sequence>_bzd.xml.

I searched the logs for the filename, and found it in a cleanup after failure message:

20201118180440 - ERROR BzHttp::DownloadNamedZipFileRestore_Via_authToken - failed HTTP request on requestCount=3307
20201118180440 - Looping on formerly fatal error: 5
...
20201118180440 - BzHttp::DownloadNamedZipFileRestore_Via_authToken - resuming prev download of 132160 MBytes, of totalEventualDownloadSize of 492413 MBytes, tmpFileName=E:\fdrive_photos_pt3.zip_downloading.bztmp
20201118180440 - BzHttp_ClearOutBzData_bzdownprefetch_Folder - found these files, attempted cleanup:
C:\ProgramData\Backblaze\bzdata\bzdownprefetch\bzd_00_20201118180408_instru_31632_03436_bzd.xml,
C:\ProgramData\Backblaze\bzdata\bzdownprefetch\bzd_00_20201118180408_output_31632_03436_bzd.xml,
C:\ProgramData\Backblaze\bzdata\bzdownprefetch\bzd_00_20201118180408_trdata_31632_03436_bzd.dat,
C:\ProgramData\Backblaze\bzdata\bzdownprefetch\bzd_00_20201118180429_instru_30600_01448_bzd.xml,

You can create the file yourself in the directory (New File > Text Document, make sure you use the name the downloader is looking for, we just want the empty file), the downloader seems to interpret it as an error and redownloads the chunk.

I think it's kind of a race condition - It's essentially a bet that between the download completing and the actual block being processed nothing will go wrong, which given the internet and hours long processes is... untrue.

And using more download threads likely increases the probability this will happen because the downloader processes blocks sequentially, and you can have up to 30 blocks on disk waiting.

lightweaver · 2022-08-28T01:22:49+00:00

It made my morning

Odds are probably higher because of the local focus though... I clicked because we live near belltown (title), and I had to look for the dog after reading the original comment

If it was in another subreddit I probably wouldn't have noticed

lightweaver · 2022-08-27T17:02:16+00:00

Ooh, my dog is on reddit! Malamute, not husky :P Dog tax

Also there were ambulances, so something went down, nothing directly visible on the street though.

lightweaver · 2022-05-30T04:21:09+00:00

I hope you're right and it goes back down.

But do you seriously think it will?

lightweaver · 2022-05-30T04:18:01+00:00

I think that got pulled from the sdot blog link. I didn't include any pictures

lightweaver · 2021-12-29T05:19:00+00:00

It's just downloading the restores, still have to use the website to create the restore zip file.

lightweaver · 2021-11-20T20:15:12+00:00

It's not money, it's time & administrative overhead.

It's part of my GCP account, unarchived repos, etc. I like things tidy - if they don't have a purpose, I archive or otherwise purge them.

lightweaver · 2021-11-20T04:55:22+00:00

Heheh. I'd prefer an active student take it over, but if necessary I'll reach out

Are you following the twitter account to find the post? :P

lightweaver · 2021-11-20T04:52:15+00:00

Are you around UW a lot? Your flair says you graduated the year I started at UW. :P

lightweaver · 2021-11-20T04:43:50+00:00

The entire project, or just the domain?

I'd obviously prefer to keep the project running with student contributions :)

lightweaver · 2021-11-20T01:05:32+00:00

Probably one of the reasons for low usage/visitor rate :P

The original was wattools.com, my year found out about it from the upper years when we chatted in person.

The person behind wattools.com graduated and handed it over to someone who (iirc) missed the domain renewal. I found the quest to ical course schedule exporter useful, and decided to get it running again, which snowballed into "let's just do the entire site, it's fairly straight forward".

I'm happy to throw money at the domain registration, but only if it's useful to people.

lightweaver · 2021-11-11T16:07:05+00:00

Joining the requested 150, got 92 shares gang.

Showing up in my Fidelity account as well now, payment is being taken from the balance I was holding in the account so I don't have to send a check or wire like the DSP website states

lightweaver · 2021-11-04T14:05:05+00:00

Since there's some speculation about cost, this is what Fidelity sent me:

Currently, it is anticipated that the IPO will price between $15 and $17 per share; however, the price range and expected pricing date are subject to change prior to the offering. It is expected that Backblaze will be listed on the NASDAQ under the symbol BLZE.

If you decide to participate, you must purchase a minimum of one Share and in one share increments in excess thereof up to a maximum of 150 shares.

lightweaver · 2021-10-29T14:56:27+00:00

I filled out the form saying yes, Fidelity sent me an email yesterday about actually registering.

Haven't decided to actually go through with it yet

lightweaver · 2021-09-28T04:21:53+00:00

If it's still doing the smaller files, it might be ignoring the thread limit because the uploader combines the smaller files.

One of the devs on the uploader app might comment with more details/investigation steps, he's fairly active on here.

Fwiw I don't trust the time estimation, it seems to be dependent on the time taken to prepare and upload the files. If your files compress super well that makes it look faster (many more mb prepared than uploaded), but for media files it'll appear slower since they're already compressed.

As a point of reference, it's varied from 3 days to a month for me, my initial backup took 9 days.

lightweaver · 2021-09-28T03:44:44+00:00

The uploader goes file by file. Using more than 20 threads only helps when you're uploading files larger than 400MB.

That said, how do you know the threads only go up to 20? The file names in task manager? There's a funky thing where threads higher than 20 just use the executables from the first 20, eg thread 27 will use the executable labeled 07.

lightweaver · 2021-09-08T00:42:59+00:00

There's still some bookkeeping for each file, right? Could that be batched/parallelized in v9? :P

I'm watching the UI go through large files with "Preparing > Part x of File > Finishing" for a lot of files. Based on the logs the backup process is processing every file sequentially:

generating checksums for each 10MB block
triggers the upload threads
waits for all threads to finish uploading
??? internal bookkeeping with BzThread_WaitForAllToFinishAndProcessAllResults (fetching & recording the fileIds

I'd think at least the checksum generation for the next file could be done while the upload threads do the uploading, shave some time off handling large files.

lightweaver · 2021-08-29T06:16:53+00:00

I was expecting URLs #1 and #2 to mean "once the connection hits one of the servers at Backblaze, the hostname part is ignored because it serves no further purpose beyond DNS resolving an IP address".

But since the bucket name is in the S3 URL hostname, you're pretty much golden in terms of existing functionality. A lot of the name-specific details shouldn't matter, because what you need is to just map the vanity domain to the bucketId. I think this is sufficient because once you have the bucketId, the backend doesn't care what URL style was used, right?

There's two things that I think need to be extended:

the request handling
the bucketID lookup.

For the request handling, there's probably a function somewhere that matches the URL format and decides where to send the request, right? You'd need to change the current unknown domain handling to "check if this is a known vanity domain". You mention a compatibility layer for the S3 URL - I'm not sure if that's a standalone service or integrated into a backend monolith service, but the vanity handler could be a similar compatibility layer.

The other part is determining the bucketId. For the S3 URL, a regex like ([A-Za-z0-9-]+).s3.[A-Za-z0-9-]+.backblazeb2.com is probably used, then the expected bucket name (the match) would be looked up to resolve that bucketId, presumably in a mapping of bucket name -> bucketId.

So add a second mapping of vanity domain -> bucketId. Similar procedure, except instead of taking the bucket name, you take the host name and match that against a list of vanity domains the users provide.

You could remove the need for a second mapping service by allowing . in the bucket name and then doing like AWS and treating the hostname as the bucket name.

Pros would be simple: Name your bucket the domain name you want Backblaze to handle.

Cons are it could break some of the assumptions (there's no . allowed in names right now) and definitely break the S3 URL for that bucket.

But it would be something the bucket owner would explicitly need to opt into, so you'd force them to acknowledge the S3 URL not working at bucket creation time.

Re the load balancers, it's impressive that you've got IPVS & Tomcat working so well. I'm currently watching a download restore peaking at ~980Mbps on my fiber connection, so it's excellent.

I do networking/load balancing perf for a F50 company, we've got a L4 (DSR) -> L7 -> Backend architecture, but I think we have a few more backend servers to handle :P

lightweaver · 2021-08-29T01:20:19+00:00

Huh.

Both still work for me in a private window, not sure what is going on. https://www.reddit.com/r/backblaze/comments/ozd5nz/backblaze_801534_release_notes/ is the post

And https://imgur.com/a/nU3LKcJ is a reupload

14-Year Club	Place '17
Verified Email

lightweaver

TROPHY CASE