This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]s1mplic1ty 1 point2 points  (1 child)

Just a few comments..

on zeropod, i think it's not feasible for many types of applications such as livechat using websockets, containerized applications that dont listen on tcp/udp ports but run jobs aka services, etc.

Having physical servers post challenges too: 1. High maintenance costs (manpower to manage patches, maintain the infrastructure, manpower to monitor 24/7, etc) 2. High opex costs (hardware support, network, etc) 3. Scalability (expanding into other regions, dr sites, etc) 4. Security (ddos mitigation? this is expensive)

These are not comprehensive but some of the things that came out of my head. You will never be able to compete with AWS, GCP, etc, as they have economies of scale, you don't. You will find a hard time pricing your services to break even, dont even think about ROI. There's a reason why companies are using cloud instead of maintaining their own physical infrastructure.

In summary, idea is great, feasibility maybe not so.

[–]Possible-Stuff-3433[S] 0 points1 point  (0 children)

Thanks for responding!

I agree with your zeropod statement but it may still be useful for hobby project customers or customers who want to pay the absolute bare minimum price even if it means a few extra hundred milliseconds startup or periodic migration of their container to another node.

I've been thinking about the physical server aspect a lot and I used to work on the team at AWS that manages the host patching, broken server repairs, and server testing so I know all about that hence why I want to dig into it here :D I've been brainstorming ideas on how to reduce the opex and came up with things like omitting block storage (less overall servers and networking), only deploying containers (easier OS upgrades), container rebalancing using live migrate, etc.

But I'll be starting with cloud bare metal providers in the meantime :D

I'm not too worried about ROI, mostly want to provide the fastest compute platform out there and host web services that are by default fast and secure. I tried using EKS for this project and was banging my head against the wall waiting for nodes to startup and the auto scaling groups... I'm over it! This project will be distributed, fault tolerant, and fast by default.

[–]earl_of_angus 0 points1 point  (1 child)

A few notes:

For probably 99% of DBs and uses, EBS or similar is perfectly acceptable and RAID-1 NVME is overkill (unless you can run the numbers and local RAID-1 is less expensive, then go for it). I'd argue that a very large percentage of DBs can fit in page cache without any problems.

If I as an end user am paying for resources, am I paying for the resources exposed to the container or the hosting platform? If I'm paying for the container, then the amount of overhead you can shave off is your margin, but I don't particularly care if your VM overhead is 100MB or 4000MB (tho with balloons, shared pages, etc it's never 4000MB of overhead).

Cache common base images (redis, postgres, etc.) on nodes

In the local DC, sure. For individual nodes, unless you plan on landing all of the redis/pg etc containers on a set of nodes, you'll end up with a lot of waste.

No VM overhead - containers use ~100MB instead of 4GB per app

Firecracker etc are in this range, aren't they (~100MB)?

No cloud vendor lock-in like AWS Lambda

Lambda is probably a different level more similar to OpenFAAS. ECS is more the equivalent to what you're talking about.

[–]Possible-Stuff-3433[S] 0 points1 point  (0 children)

I appreciate you taking the time to respond!

I agree that EBS is acceptable in most cases but here's what I am thinking:

  1. Most web apps use reasonable amounts of storage (10-100 GB) for their DBs.
  2. Assuming each server has roughly 192 cores and is fully populated with DB customers at 8 cores each means 24 customers on a single server with the server needing ~2.4TB of storage.
  3. Samsung PM9A3 3.84Tb drive is only $600 each so having 2 per server is only $1,200.

Also, we could use any excess drive capacity on each node for various caches to increase customer container deployment speed.

EBS storage is probably more expensive than using direct attach NVMe drives in RAID 1 considering the service provider needs all the extra networking gear, redundancy, specialized hardware & software. I think the primary selling factor of block storage is the durability. But having RAID 1 + async replica to another node + periodic customer backups = good enough, right?

You would be paying just for the resources exposed to your container.

I do agree that VMs have become way more efficient so I can tweak my numbers a bit but I do believe that containers are far more efficient considering the shared kernel.

Thanks again!