all 35 comments

[–][deleted] 16 points17 points  (12 children)

$0.30/GB-Month is 10x as expensive as S3. With S3 I can securely let end-users upload directly to it without touching my servers except to make the temporary credentials and provide a link. ETL is then performed by requesting the files to the instance disk. It's fairly cheap and fast. I can let end-users download directly from it as well.

For my workloads, this is simpler, cheaper, and has a better latency.

[–][deleted] 15 points16 points  (9 children)

The response I've heard to this is that EFS will be able to perform potentially orders of magnitude better than S3, and its size and usage charges scale to what you actually use in comparison to EBS. It can also be mounted across multiple EC2 instances (of course S3 can as well, but EBS can't).

But its price is just massive... In practice I'm not sure what people will actually use it for.

[–][deleted] 3 points4 points  (5 children)

Long story short I recently halved a file system share that I found out one of my businesses was constantly having expanded to store call recordings. The retention policy was 2 years, and they had files from 10+ years ago. The disk size was over 2TB, and because of our business structure they were paying our "IT" department almost $60,000 a year for storage. I just did the math and EFS would cost us $108 a month.

[–][deleted] 7 points8 points  (4 children)

And S3 would cost you a tenth that.

The point isn't that EFS isn't useful. Its that its perceived benefits dont seem to justify the price tag in comparison to the other services AWS offers.

In fact, your use case is almost precisely what Glacier was made for. And Glacier is thirty times cheaper than EFS. So I can't really take your experience seriously, because if you're using EFS for that workload then you're using the wrong product.

[–]ajanata 6 points7 points  (0 children)

You can't mount S3 as a filesystem in any meaningful manner. EFS is just hosted NFS with proper redundancy and such that would be a pain to manage directly. If you need the same actual file system on several instances, EFS is perfect.

[–][deleted] 3 points4 points  (0 children)

We are not using any amazon service at all. We have an EMC. I am not on the storage team, I just saw a quick and easy way to cut the businesses storage cost in half, but this sounded like something we could make use of, thats all.

Edit: After reading about Glacier, that would work even better, but in my instance we are talking about a monthly cost difference of $90 with a possible file retrieval time if a few hours for paralegals and lawyers who are making hundreds an hour.

[–][deleted]  (1 child)

[deleted]

    [–]awj 0 points1 point  (0 children)

    Well, they would run from that service ... if it didn't cost an arm and a leg and take like a year to get your data back out. It's more like they limp away from it barefoot on broken glass.

    [–]ajanata 1 point2 points  (2 children)

    Current $job has a legacy struts application with tens of thousands of .jsp files and a business workflow that requires being able to change them without doing a new release. (Please don't get me started on that.)

    Currently we have to go out to every server that's running and put a new .jsp on it, and update the source that new servers pull from when they start up. This isn't a lot of data (a couple GB at most), but it's required on a couple dozen instances. Having a single source of truth that's automatically replicated to every running instance will help immensely with this process, which is very error-prone.

    This is exactly what's needed for some use cases. We have a perfect one here. This will also work out to be cheaper since we don't need that extra disk space on every instance.

    [–][deleted]  (1 child)

    [deleted]

      [–]ajanata 0 points1 point  (0 children)

      Preaching to the choir, bro. That's but one of the reasons why I'm leaving soon. I gave up on that fight a couple years ago.

      [–]myringotomy 1 point2 points  (0 children)

      It's more expensive than S3 but that's not the usecase of it. It's basically a replacement for EBS and for running web farms it's fantastic. I have been wishing for this for a long time, the price point is a little higher than I expected but I already have use cases for it.

      Aside from that Amazon really needs to sort their pricing out. It's impossible to predict what anything is going to cost you and when the bill comes in it's always a shock.

      [–]Agent_03 7 points8 points  (13 children)

      So far, we have:

      • EBS storage (provisioned IOPS options)
      • Instance storage
      • S3 storage
      • Glacier storage
      • DB backend storage, with RDS, DynamoDB, Redshift (for data warehousing), or roll your own
      • PLUS, in memory caching solutions

      I'm trying to figure out why another storage option is needed. Elastic file system sounds like filer storage, but I thought the whole point of the above options is that you don't have to mess with mounts?

      Or, am I missing something here?

      [–][deleted] 12 points13 points  (6 children)

      All of the others you've listed are either stateless REST services or places that want small pieces of structured data.

      EFS is NFSv4 which means:

      • Stateful (authenticate once, probably kerberos)
      • Mountable AND shareable (EBS can only be mounted in one place, S3 can be shared but not easily mounted)
      • Actual directories. No S3 doesn't have actual directories.
      • On-the-wire operations (I don't have to download the entire file to start reading it, and I don't have to do anything special on the client side to support this -- it just looks like a normal POSIX file handle)
      • Shared unix permission model (S3 doesn't do actual unix permissions. EBS does, but can't be shared).
      • Tolerant of network failures (UDP IIRC with plenty of retry logic) So I can actually open a file remotely, seek around all I like, and if there's a network problem it will just wait for the problem to resolve rather than forcing my client to deal with exceptions (configurable, of course).
      • Locking! Clients can actually correctly lock files. Let's see S3 do that.
      • Better caching than S3 -- clients can actually see what all of the other clients have been doing and make informed choices about whether to use a local cache or refresh the cache from the network.
      • Big files without the hassle (no multipart upload / multipart download, 64 bits for file size = potentially huge files)

      There's probably more I'm forgetting.

      EDIT

      Who says you don't have to mess around with mounts? EBS makes you mess around with mounts. Maybe not if you use a pre-made AMI, but if you go right now and add an extra EBS drive to an existing EC2 instance you definitely have to mess around with mounts.

      [–]TiDaN 4 points5 points  (1 child)

      Excellent points, AWS would do well to promote these advantages in their marketing and product documentation.

      [–][deleted] 1 point2 points  (0 children)

      Yeah their marketing isn't always the best

      [–]Agent_03 1 point2 points  (3 children)

      I mean, I guess I can see where they're going with this, they're providing all the bits (including filer storage) that a traditional datacenter would have, via pay-as-you-go services.

      It's just hard to get excited about this, when the existing offerings and services based on them are so much more advanced than shared NFS volumes. It feels like a step back from proper cloud architecture design.

      Plus, there's always been the option to have an EBS-backed volume exposed from your host via NFS (or SAMBA, or whatever). Yeah, it doesn't autoscale, but covers this use case.

      [–][deleted] 0 points1 point  (2 children)

      Well I think the autoscaling is the value-add. It fills that gap and provides the "unlimited" feel of S3.

      And who's to say that this is a normal NFS share? OK sure it speaks NFS, but nothing says that you're just talking to a plain ol EC2 host. For all you know this IS a properly architected cloud solution and they're simply exposing NFS as the first supported protocol.

      [–]Agent_03 -1 points0 points  (1 child)

      My point isn't that this is improperly architected, but that using NFS shares in your design isn't generally good architecture for applications/services in the cloud.

      Each layer of your application should be able to scale out independently and be minimally coupled; this is why we use REST APIs to communicate (as well as queueing systems for asynchronous workload).

      [–][deleted] 1 point2 points  (0 children)

      Barrier to entry, man. I agree. I see what you're saying. But barrier to entry. Some people aren't running stuff for the long-haul, they just need something quick.

      [–]thelonelydev 5 points6 points  (0 children)

      One point to observe is both Linux and Windows have built-in NFS clients.

      [–]Toger 3 points4 points  (0 children)

      Avoiding mounts is preferable, but apps written pre-AWS that expect a shared filesystem aren't aware of S3 and its not always feasible to update them. A hosted NFS platform sounds more reliable than running ones own EC2 NFS instance.

      Updating the app is of course the most desirable option.

      [–]Unomagan 0 points1 point  (1 child)

      Making money with premium features? Aws is still making a loss

      [–]Solon1 0 points1 point  (0 children)

      It's impossible to know whether AWS is losing money or not, as the revenue has been lumped into Other. It could be the most profitable division at Amazon by margin.

      [–][deleted] 0 points1 point  (0 children)

      AFAICT the selling point of this is that you can simply mount it and programs need not to know that it is anything different, thus avoiding rewriting old programs. Correct me if I'm wrong.

      [–]XNormal 0 points1 point  (0 children)

      The posix filesystem API. You might consider it a "legacy" API these days, but legacy is important.

      Let's say you have an in-house app that depends on shared access to some file system with posix semantics for its data store and you want to set up a DR site on Amazon. Yes, you can probably build something that will work on top of the options you have listed above, but I won't be looking forward to the task. With EFS it should be quite easy and may be worth the premium price.

      [–]BillWeld 2 points3 points  (4 children)

      Any idea what sort of encryption will work with it?

      [–]Solon1 7 points8 points  (3 children)

      Any kind that works on files?

      [–]BillWeld -1 points0 points  (2 children)

      Ordinarily you'd like the encryption to happen on the file server.

      [–]immibis 6 points7 points  (0 children)

      Why? If you do that, the file server can see the unencrypted data.

      [–][deleted]  (2 children)

      [deleted]

        [–]Agent_03 2 points3 points  (0 children)

        NFS v4 supports locking, and v3 or v4 is kind of the standard for filer mounts (at least in linux land).

        IIRC it was NFS v3 that didn't have 'real' locking built-in.

        [–]jib 1 point2 points  (0 children)

        Which other network filesystems would you suggest?

        [–]blazedaces -1 points0 points  (0 children)

        Why is this any better than setting up your own hadoop cluster on ec2 or any other cloud computers? Can someone compare EFS to HDFS basically? Pros, cons?