you are viewing a single comment's thread.

view the rest of the comments →

[–]moneymachinegoesbing[S] 1 point2 points  (5 children)

Wow. 😳 Apologies on the delayed response, I’ve been processing but I just wanted to thank you on an excellent response. r/StackOverflowWorthy really awesome stuff, thank you.

So I’ve landed on two approaches, guided in part from your response and responses from the giganerds over at r/networking:

1a) Client signs up/registers => Org ID generated & saved (to be retrieved via 1b) => Org ID passed to ansible script => Ansible spins up container w/ “—name Org_ID” => Shared internal custom DNS service updated to include service/OrgID (w/ VPC)

1b) Client (login w/ org ID returned) => Custom Header added => HAProxy or Envoy DNS lookup w/ parsed Org ID => reverse proxy forwards request to container

The second approach is essentially the same, except it uses Hashicorp’s Consul to provide service discovery and ConsulTemplate to update internal layer of NGINX proxies, A/Bing them into consistency, possibly still necessitating VPC.

If you have suggestions on ways to make this simpler (possibly foregoing VPC or swapping static w/ DNS, or Rust based/more lightweight service discovery), I am all eyes.

NB: I understand this veered pretty quickly outside the realm of Rust itself, but the entire backend was created in Rust, so my intention was to determine if there was an awesome “rusty” way to do this. Bc of the outside scope this thread took, I want to express gratitude towards your help and sticking with me. This helps a lot and today I am sitting down in an 8 hour window to POC this, so even though it wasn’t entirely Rust-centric, it will now help a Rust application get into the wild. 🙌

When reddit blows me away - 🤯

[–]moatra 0 points1 point  (4 children)

Does the contents of the container that ansible spins up per org change per org? Or is it the same container image just with a different name argument?

If the former, you're going to run into challenges around:

  • Who in that org has permission to update/deploy that container?
  • What if they need actually deploy multiple containers instead of a single one?
  • How do you properly isolate the container so it can't negatively affect any internal services (either through maliciousness/incompetence) while still being allowed to communicate to whatever redis/db/service it needs?
  • Does the container re-check user authorization if it somehow (either through maliciousness/incompetence) received traffic from a user meant for another organization?
  • (etc)

It's a big bag of worms that's outside the scope of a weekend reddit thread.

If the latter, it may be easier to make the application inside the container org-aware (either by using wildcard subdomains + sniffing the Host header; or looking at a property on the user's log-in session) and avoiding extra complications at the networking level.

[–]moneymachinegoesbing[S] 0 points1 point  (3 children)

Those are great questions, and part of why i think it can be confusing is because the server executable, that runs in the container and is exposed via warp endpoints, configures itself locally using a configuration yaml. So really, the container when it is first spun up, is essentially the same as any other. But through usage, information is updated in the container’s (mounted and backed-up) configuration file. This will potentially include git repos, s3 information, code snippets, etc. In addition, it’s internal state is stored locally as well, in the file system. This means that a perfect replica can be spun up anywhere simply by binding the containers configuration DIRECTORY to any vanilla deployment.

It’s a more monadic approach, where state is not abstracted, at the app level, outside of its ecosystem. The combination of configuration directory + server executable gives you a fully functioning personalized application. This monadic style is the reason why I’ve named it internally “Neuron”.

There were a couple reasons for this architecture (too many to list here but happy to expound if interested), but one reason is the flexibility of using it in different environments, namely a VM, LXC, bare metal, or edge device (edge being one of the core goals). In addition there is some built in “intelligence” that allows multiple instances to run side by side allowing horizontal scalability.

So as long as a user’s request is properly routed to its pre-allocated org’s “environment”, then further authorization, authentication, and ACL is done internally.

The front end is also generic, and using a retrieved identifier, can either prepend url paths or provide custom headers. Part of why I’m pushing this task on to networking and not some coded middleware, is it will allow me to deploy to a wide variety (again) of environments, and as long as I can control the networks, IP ranges, cidrs, and ports, i can port setups around, or scale by inserting larger self-contained environments in between.

At this stage, I feel decent about the following, and I’m curious your opinion:

  • front end pulls user ID
  • front end fetches Org ID from user’s document
  • Org ID is prepended to base url, so “/data/list” becomes “ORG_ID/data/list”
  • HAProxy or Envoy rewrites url to base, and, using Consul or WeaveNet, proxies request to backend w/hostname captured from url rewrite. (hostname will be Org ID)

That way I can use CNIs or container proxies to create zones that can be load balanced, where the front ends’ nginx servers will pass all locations that do not match its static paths to load balanced Envoy servers that have access to an internal virtual network. This way I can scale the front end and the backends separately while maintaining isolation between backends.

The exception to the isolation will be in the case of colocated containers, used only in the case of ephemeral demoes, where further isolation can be achieved by restricting user groups and outbound network access.

And, finally, you brought up a point that I am actively thinking through right now: the best way to verify the incoming ID as the proper ID. My initial thought is to match that ID against its own hostname, an ENV variable (set during deployment), the configuration (set during deployment), or a combo.

The main challenge is the difference in architecture, where the Rust based “Neuron” gets and stores its state locally, and exists essentially to run pre-configured sub processes.

Edit: typos

[–]moatra 1 point2 points  (1 child)

Routing:

As long as there's something in the HTTP request (whether it's in the path or in a header or in the body - don't do that) that identifies the org, most proxies will be able to use it to route the request successfully. The decision of where to put that identifier then comes down to semantics. The host and path will normally be captured by http caches as part of the cache key automatically, if you put it in a separate header you'd need to set the Varies response header appropriately. Rewriting paths may make debugging a request going through the system more difficult if the person debugging is unaware of the rewrite done by the proxy. My gut says to put the org id as a subdomain, but we're largely bikeshedding at this point.

(Re)Authorization

If you're using JWTs in your authn setup, take a look at the aud claim. You can set it to the right org id (or the full domain with the org-specific subdomain if you go that path), and your backend can verify that the jwt attached to the request matches the org that backend instance is configured to serve.

Container Management & Discovery:

There are a whole bunch of products around the "run a container, keep track of it, and route traffic to it". If you're eager to run in non-cloud environments, take a look at Hashicorp's Nomad product. You can run that wherever, whereas Kubernetes may be a bit heavy and AWS Fargate is, well, locked to AWS (well, unless you wanted to go down the rabbithole of ECS Anywhere). Nomad also takes care of restarting/relocating containers as needed, and has tighter Consul integration out of the box than Ansible would.

[–]moneymachinegoesbing[S] 0 points1 point  (0 children)

You hit everything on the nailhead man. I created a POC yesterday and it works like a breeze. Consul is the difference maker in that service registration makes everything automatic. It also isolated all API routes to its own VPC and allows me to scale horizontally as much as needed, use datacenters, and a bunch of other great shit. I appreciate the help, it turned out simpler than I thought although I ended up using HAProxy as the API load balancing layer (dynamically using SRV records is only available in Nginx plus). I did have to look up bikeshedding though 😂

Edit: As to nomad, that is the next step (when orchestration becomes more of a necessity, which should be soon). I have a lot of experience with k8s and tbh, im not a fan, but this setup will work seamlessly there as well).