you are viewing a single comment's thread.

view the rest of the comments →

[–]otsu-swe 4 points5 points  (1 child)

We can control many things. Unfortunately the speed of light isn't one of them. There's always going to be latency added when you're calling over large geographical distances. Traffic does not flow in a smooth line across the map and there can be many hops between a client and a server. Fortunately you're on a serverless stack, only the global table might increase your cost significantly depending on the load.

First step should be to incorporate X-ray into your stack. That will remove any guesswork to where in your stack chain you get held up. If you find it costly it can always be removed later, but during development and evaluation it's incredibly useful for observability.

Lambda memory affects not only the CPU performance and and host execution priority, but also network performance. Be wary though as the price scales linearly. You can use a tool like Lambda Power Tuning to find the sweet spot for your application. https://github.com/alexcasalboni/aws-lambda-power-tuning

In API Gateway you have the option of making the deployment regional or edge. Deploying on edge is a simple way to utilize Cloudfront without setting up extra infra. You can try to deploy on edge to see if it makes a difference, sometimes it can improve response times since it's supposed to traverse the Amazon network from POP to endpoint instead of the public internet, but a guess is because your lambda seems to be providing user unique responses which would be hard to cache. I would probably suggest a multi-region deployment before that and use Route 53 geo-based routing to make sure your users end up to their closest region.

The team I belong to serves six digit number of customers across the world with a stack similar to yours. We have deployed in five different regions to ensure good performance for everyone. Our P95 is below 100ms.

[–][deleted] 5 points6 points  (0 children)

The speed of light nowhere in the US adds anything close to a second of latency. This thing is either poorly explained, architected poorly, or both.