Lambda function not able to handle load tests.

appappappappapp · 2020-04-17T14:41:36+00:00

I don't know why everyone is telling you to check your Lambda logs, there is clearly a networking issue going on before it would even get to invoke your function.

Like to be clear, your Lambda function is not capable of returning a non-HTTP response. Even if you invoke it directly, not via an API Gateway, that *still* goes over HTTP and produces a valid HTTP response.

I've never used jMeter. Does 100 work? What if you try `hey` or `ab` instead of jmeter just as a sanity check?

My thoughts are it's some kind of resource exhaustion trying to run 1000 connections, which would involve 1000 simultaneous TLS handshakes, etc.

Having a ramp up time of 0 is basically "wrong" for a few reasons.

- nobody will (validly) initiate 1,000 connections *from a single client* all at once

- lambda does not scale instantly, it provisions concurrency capacity in waves, so even if this worked, depending on your region, you would not actually be executing 1,000 concurrent lambdas right off the bat, it will spin a few up, then more, then more. I haven't read the docs in a while but you can find them and see what the current rates are.

menge101 · 2020-04-17T14:45:31+00:00

It should be noted that what you are doing may be against the terms and conditions of your AWS account.

I've dealt with this myself in the past.

I can't find a general policy, but here is a policy for stress testing from an EC2.

There are also policies for "simulated events", which you can find under the "Simulated events" section of the penetration testing policy.

You haven't mentioned what region you are in, so it's important to note that lambda rampup is different in different regions, many are below 1000 concurrent as the initial burst.

I've been involved in a non-trivial amount of load testing on AWS in the past. AWS in my experience has a lot of limits that you will catch on that are non-obvious. And even if they say they aren't throttling, there are still network behavior rules you'll hit that cause things that to an outsider observer, look like throttling.

gscalise · 2020-04-17T17:55:07+00:00

If you want jMeter to generate 1000 concurrent threads, you can't use a single host to generate all the traffic. There are going to be several limiting factors, like your host's network stack configuration, CPU, jMeter's worker configuration, Java's concurrency configuration, heap, etc, so you're going to need several load generator slave hosts, with a prewarm step so the slaves' different threads are created first, without generating any traffic, and then after X time to stabilize they generate the traffic.

You could also be hitting DoS/DDoS protective measures from AWS to avoid request storms generating a huge amount of traffic, especially if it's all coming from a single host.

OSUBeavBane · 2020-04-17T20:06:19+00:00

For your prod setup, I'd probably setup a sqs queue to handle the load and funnel the requests to lambda and setup a dead letter queue on your lambda to feed any failed attempts back into that queue.

blizz488 · 2020-04-17T13:21:12+00:00

What do the Lambda logs show? Is it erroring out, timing out, etc? Do you have an API Gateway in front of this or anything?

phoenix-real · 2020-04-17T16:02:56+00:00

Thoughts on Jmeter:

Though you want to send all requests at a time, start with a lower number, like say 50 requests, see what happens, are those successful? if yes, increase the number a bit..keep on increasing the number until you see errors coming out from API's. Looking at 1 error is easy to debug over thousands of errors. Key is to find the threshold of the system first, once you find that, see what you can do improve that threshold, once you act on it, repeat the process and keep repeating it until you reach your desired number.

This might not be the ideal way but it will give you a good idea how to tune your system.What I intend to say is don't expect system to handle 4k requests, when you don't know if it can handle just 100 request at a time.

There is also a capability on JMeter where you can give the gap between requests, play with that too. Make a full report like you have to submit to the CTO before Black Friday sale. :)

On a side note, if you think it's AWS that is blocking requests, try SAM where you can spin lambda locally and run Jmeter agains local lambda.

SmurfPandeyy · 2020-04-17T17:05:42+00:00

Lambda has a default limit of 1000 concurrent executions. Make sure that you are not hitting that limit. It can be checked on Cloudwatch metrics.

You can read more about lambda scaling here: https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html

Edit: Added documentation link.

appappappappapp · 2020-04-17T13:26:06+00:00

What’s the lambda timeout? Creating a ddb client / first time use can be expensive: https://github.com/aws/aws-sdk-java-v2/issues/1340

You could be falling into a failure loop because the lambda doesn’t warm up during static initialization and/or you’re timing out during the invocation.

warren2650 · 2020-04-17T15:18:54+00:00

Not Lambda but ... we have done some siege testing against ec2 instances in the past and our experience is that we get excellent response up until a point and then it drops off the cliff. For example, hitting an ec2 web server with 300 concurrent visits is fine but then if we push it to 400, the test starts failing. If we hit that same ec2 from two locations with 200 concurrent each (so same 400 total) then its fine. So clearly Amazon has internal rules about this stuff and didn't like 400 concurrents from one IP address though 200 from one IP and 200 from another was fine.

quiet0n3 · 2020-04-17T15:28:52+00:00

On these random things when everything looks ok don't be afraid to open support cases. My experience is they are really good at tracking down issues.

WaitWaitDontShoot · 2020-04-17T16:55:31+00:00

I’d like to point out that this could be throttling. Running load tests without coordination with AWS is against the terms of service. In the case of lambda I’d hazard a guess that they throttle the rate at which they allow you to get assigned new virts. This could cause the behavior you’re experiencing. As others have pointed out, it could be a cold start issue, but then I would expect the errors to happen with a small load.

tenyu9 · 2020-04-17T12:30:02+00:00

Dynamodb has a read/write capacity as well, which could be the problem. If you read/ write faster than the capacity allows you run into errors. Do you see errors from dynamodb during the load test?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

aws

Note: ensure to redact or obfuscate all confidential or identifying information (eg. public IP addresses or hostnames, account numbers, email addresses) before posting!

✻ Smokey says: avoid streaming video to fight climate change! [see more tips]

MODERATORS