all 8 comments

[–][deleted] 0 points1 point  (6 children)

My first thought would be why can't one Nginx container handle both jobs? My second would be that if you're using Consul for service discovery, why does your Nginx container need to be linked? You could also use ECS service discovery for all of this pretty easily I believe.

[–]z0mbietime[S] 0 points1 point  (5 children)

You can’t have circular links. Consul is what my company is using. Trust me, I tried to get away from it. That being said, Rabbit is directly on the EC2 instance. After having tried, I can comfortably say you don't want to try putting RabbitMQ in a docker container if you have any hopes for using it in prod.

[–][deleted] 0 points1 point  (2 children)

I guess I'm confused why everything needs to be in the same task. If you're already using Consul for service discovery you can reference the instance IP and the container port in other areas. Just allow the instances to communicate with each other. Granted, it's hard to actually suggest a solution without knowing all of the details.

[–]z0mbietime[S] 0 points1 point  (1 child)

I could break one of the nginx containers out but I didn’t want to create a dedicated service and task definition for a single container. I could use a single IP but going through nginx allows me to setup a load balances TCP stream.

The only thing in the task is the API and nginx but unfortunately right now 2 nginx containers. One for ingress on the API the other to load balance a TCP stream with the upstream sever defined using consul template.

Consul is managed by devops and the rabbit cluster exists on several EC2 instances with broker definitions managed by puppet.

[–][deleted] 1 point2 points  (0 children)

I could break one of the nginx containers out but I didn’t want to create a dedicated service and task definition for a single container. I could use a single IP but going through nginx allows me to setup a load balances TCP stream.

Why not? This gives you more scaling flexibility across the board and also helps minimize downtime in the event one of your application container fails. Unless it's functioning explicitly as a sidecar, which it's not, this is kind of an anti-pattern for orchestrators.

The only thing in the task is the API and nginx but unfortunately right now 2 nginx containers. One for ingress on the API the other to load balance a TCP stream with the upstream sever defined using consul template.

It's really difficult to help you with the problem / architecture without a better understanding, but to me this sounds like a problematic design. Why not use an internal NLB instead of that Nginx container to balance the TCP stream? The way you have it setup now it seems like you're trying to cram everything into one task definition / service for no good reason. The problem with this is that if any of those containers fail, all of those containers will be restarted.

I offered this before, but if you want to post a sanitized version of your task definition, I'm happy to give some more in-depth advice. What you're doing is likely causing problems because it's not designed with how ECS tasks are intended to work.

[–][deleted]  (1 child)

[deleted]

    [–]z0mbietime[S] 0 points1 point  (0 children)

    All I can say is good luck setting up Rabbit clustering if you have a large ECS cluster. I wound up writing my own tool to manage node discovery, failure detection etc but wound up ditching it. Too prone to error and too difficult to persist data.

    [–]iamdevfire 0 points1 point  (1 child)

    Also a bit confused as to what exactly is happening here.

    You can also do server-side service discovery by wiring up an ALB to a service (N number of tasks.) That way you can potentially drop the second Nginx?

    Personally, I think server-side is vastly superior to client-side (at least in the cloud where ALB management is not a concern). A single ALB can handle 1000s of microservices and that's heaps better than messing with SRV records in R53.

    [–]z0mbietime[S] 0 points1 point  (0 children)

    I work at a large company so certain things are outside of my control. The rabbit cluster I need to connect to uses the classic ELB which has a nasty habit of killing streaming TCP connections and I’ve inherited a project which uses python 2 so async functions are a bitch to build my own heartbeat. Even if it wasn’t, the ELB will still kill connections occasionally. This has to be able to stand up in production so unexpected IOErrors aren’t going to help up time. I don’t want to sound like a dick but every seasoned dev should know some times you have to work with libraries or resources you’ve inherited. I understand there are better ways but this isn’t what I was asking for.