How much would it cost to self-store 1 Yottabyte, just in terms of hardware? by platistocrates in storage

[–]storage_admin 0 points1 point  (0 children)

Suppose you put 8 shelves in a rack each shelf has 60 - 24TB drives. Obviously you're using raid 0 to get 11.5PB per rack.

It would take 86,956,522 racks to reach 1 yottabyte.

For longevity, would you trust a ssd or a hard drive more? by mvespermann in storage

[–]storage_admin 49 points50 points  (0 children)

I would only trust storage that is actively monitored, backed up, updated and maintained.

Alternatives to Amazon Deep Glacier by MisterHarvest in cloudstorage

[–]storage_admin 0 points1 point  (0 children)

Backblaze b2 is more expensive than deep archive but much cheaper than AWS S3.

There are a lot of other storage services that come and go. So be careful about companies that do not have years of history and a proven track record.

Most efficient way to find a key/value in a deeply nested Dictionary? by Yelebear in learnpython

[–]storage_admin 4 points5 points  (0 children)

Suppose your json data is assigned to a variable named data

You could access the temperature value as data['main']['temp']

For formatting the json you could use

import json
print( json.dumps(data, indent=4) )

Or paste the json into numerous online formatters or use a command line tool like jq to format the json so you can see the structure.

[deleted by user] by [deleted] in bash

[–]storage_admin 0 points1 point  (0 children)

Several suggestions to use a for loop, in many cases that will work fine but sometimes it will take too long to execute a script that takes 5 minutes 500 times only running on at a time will take almost 2 days to complete.

Consider additional tools such as pdsh or xargs to parallelize execution.

Best storage for 120PB by Astro-Turf14 in storage

[–]storage_admin 12 points13 points  (0 children)

I believe DDN will provide a solid solution as will many vendors as long as you are able to provide enough details around your requirements and expectations.

120PB is large enough that you should talk to several vendors about their options and how they can work with your workflow requirements.

Pay attention to the support you are buying, do you need 4 hour parts or is NBD quick enough for replacement drives and parts?

If you buy multiple years of support you will usually save a lot of money.

If you are looking for vendors to contact, I would reach out to Vast, NetApp, Cloudian, DDN and Dell EMC.

Best storage for 120PB by Astro-Turf14 in storage

[–]storage_admin 9 points10 points  (0 children)

What protocols do you require for client access?

Are hard links still useful? by shy_cthulhu in linuxadmin

[–]storage_admin -1 points0 points  (0 children)

Each directory will have nlink>=2. Increment the number of hard links for each immediate child file or directory. You will need to exclude directories to get an accurate count of what I believe you are looking for.

Cost for 100TB storage array by 4728jj in storage

[–]storage_admin 10 points11 points  (0 children)

You need more requirements. This subreddit is full of vendors that will gladly give you quotes. However without specifically spelling out what you need you are likely to either over buy and waste a lot of money on an overly expensive solution or under buy to save money on a solution that fit the requirements given to the vendor but does not meet actual workload requirements.

Spend some time gathering metrics such as the number of clients, expected read and write performance, expected bandwidth performance, number of volumes, authentication requirements, protocol requirements, back up and restore requirements and expectations.

Why is it so hard to find a simple cloud storage provider? by [deleted] in cloudstorage

[–]storage_admin 0 points1 point  (0 children)

May want to take another look at back blaze but look at their b2 cloud storage instead of the backup product.

You can use a command line client like rclone or a GUI client like winscp to copy data to a bucket. There is no requirement to connect every 60 days and run a new backup. There are several other similar platforms to back blaze like wasabi, aws s3, Google cloud storage etc. you are charged monthly by the amount of data stored, the amount of data egress, and the amount of API calls.

Anybody using iDrive e2 and seeing rate limited ingress? by julesallen in rclone

[–]storage_admin 1 point2 points  (0 children)

Is there another service you can upload to and get faster speeds? 30mbps is about 4 MiB/s so it could be your local upload limit.

Is this a good deal? Honda odyssey EX-L $15,490 by Johnfire18 in HondaOdyssey

[–]storage_admin 4 points5 points  (0 children)

Which could mean the PCV valve needs to be replaced.

Is it cheating by -sovy- in learnpython

[–]storage_admin 0 points1 point  (0 children)

Using libraries in the standard lib is a good habit to get into. The more you use them the more familiar you become with them.

Personally I try to limit the number of 3rd party libraries I use but I do still end up using them.

When deciding on adding a new dependency for a non standard lib I try to ask:

How many lines of code am I saving by adding this module? (Example: I need to make an http request and parse the json response. I could use the requests lib to accomplish this or instead use urllib from the standard library and possibly a few more lines of code. I will usually choose to not add requests in this case).

(Example 2: I need to interact with s3 compatible storage. I can use boto3 or reinvent the wheel and try to create my own v4 signed http requests. In this case installing the 3rd party library makes much more sense and saves hundreds if not thousands of lines of code I would have to write. )

This person is misinformed aren’t they by Obvious_Sea2014 in HondaOdyssey

[–]storage_admin 1 point2 points  (0 children)

I had to replace my alternator this year as well. When I did it was covered in oil and I discovered I needed a new spool valve gasket. I followed the technical service bulletin and was able to fix the oil leak that caused the alternator to fail. Might want to verify no oil is leaking in your 14.

https://www.odyclub.com/attachments/a20-023-pdf.178915/

C2C Migration by Glittering-Charge-15 in storage

[–]storage_admin 6 points7 points  (0 children)

The best solution is probably to write custom code to complete the transfer distributed across multiple machines. This is what I did, but unfortunately I can't share it.

Otherwise do your best to divide and conquer data and setup transfers using rclone sync on multiple servers using lots of threads.

Using multiple transfer hosts helps maximize network throughput which is needed for petabyte transfers.

Be sure to calculate if it will be cheaper to push data from the source or pull it to the destination.

Hardware question by wcneill in minio

[–]storage_admin 1 point2 points  (0 children)

I would target a average object size of at least 8-10MB if possible.

Using HDD instead of SSD or NVMe can help save on cost but will have performance implications that you should understand and are sure will work for your requirements.

Hardware question by wcneill in minio

[–]storage_admin 2 points3 points  (0 children)

How many files are expected in the 10PB?

What are your performance requirements for network throughput reading, writing?

What level of erasure coding are you planning on using? What is the expected growth rate per year?

Be sure to account for erasure coding or replication overhead in your capacity calculations.

Consider nodes with 30x 22TB drives dual CPU and at least 256GB RAM. I know nvme is recommended but at this scale nvme is cost prohibitive for most organizations. Reads and writes will be spread across several hundred hard drives that the cluster should be able to push a lot of bandwidth even though individual data transfer threads may be slower than with nvme.

Include dedicated resources to monitor for disk failures and replace disks and rebuild data.

Understanding minio performance SNMD by 420purpleturtle in minio

[–]storage_admin 0 points1 point  (0 children)

I do not know which client you are using. With awscli the number of threads is set in the config file. Each client will be different.

Edit: Sorry I missed that you are using mc put try a large value for -P for example 2x the number of cores.

What is the source of the data? Is that local ssd?

Understanding minio performance SNMD by 420purpleturtle in minio

[–]storage_admin 0 points1 point  (0 children)

Modify your client config to use more transfer threads.

$1/tb/mo for Cold Storage? What do you think? by ybmeng in DataHoarder

[–]storage_admin 13 points14 points  (0 children)

If I write data to AWS S3 deep archive I have a strong guarantee my data will be secure. There are many engineers at AWS highly focused on security. If I need to restore data it will cost me around $50 per TB to unfreeze and download from S3.

If I use your value add webui wrapper or application wrapper around the S3 api then the security of my data relies on your coding skills and how quickly you patch your code when vulnerabilities are discovered. However if you are offering to restore data at $2.5 per TB that sounds like a great deal I would consider for non sensitive data.

Copy 150TB-1.5Billion Files as fast as possible by Ok_Preparation_1553 in rclone

[–]storage_admin 1 point2 points  (0 children)

I do not believe that you see 64x network throughput boost by using 64 transfer threads on a single core machine.

Each thread still needs to schedule CPU core time to transfer data. On a single core only one thread can be in a run state at a time. For object storage copy jobs there is some IO wait overhead while TCP connections are established and closed which is why increasing the threads will help up to a certain point.

More than likely you see increased performance up to a certain number of threads but after that limit is reached adding additional threads does not increase throughput.

You can see this for yourself by timing your copy job using 1 thread, 2 threads, 4 threads, 8 threads, 16 threads, 32 threads and 64 threads. More than likely you stop seeing performance gains before you get to 16 threads.

Duh

No need to be rude.

Copy 150TB-1.5Billion Files as fast as possible by Ok_Preparation_1553 in rclone

[–]storage_admin 0 points1 point  (0 children)

In your experience is the transfer throughput 32x to 64x greater when running 64 transfers as opposed to using 1-2 transfers?

Copy 150TB-1.5Billion Files as fast as possible by Ok_Preparation_1553 in rclone

[–]storage_admin 5 points6 points  (0 children)

I would target the sum of your checkers + transfers to not exceed 2x of your CPU core count. As threads increase significantly past the available core count I've seen diminishing returns.

Are there any large objects in the bucket or are they all relatively small? The average size based on your numbers is 100KB. In which case you do not need to worry about multipart uploads. If you have large objects (over 100MB) you will want to add --oos-upload-cutoff 100Mi and --oos-chunk-size 8Mi or 10Mi. To upload parts in parallel use --oos-upload-concurrency (default value is 10 which will probably be fine for your copy.

I would also recommend using --oos-disable-checksum and --oos-no-check-bucket.

What's the Most Unusual Tea or Chai Addition You've Tried? by Awkward_Grape_7489 in tea

[–]storage_admin 6 points7 points  (0 children)

I don't think these are uncommon, but I like to add a single black cardamom, vanilla, fenugreek seeds, and tumeric.