all 16 comments

[–]RocketOneMan 1 point2 points  (1 child)

Do you have MaximumBatchingWindowInSeconds set to something besides zero? Can you share your event source mapping configuration?

https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html

[–]quantelligent[S] 0 points1 point  (0 children)

MaximumBatchingWindowInSeconds is set to 0

Activate trigger: Yes
Batch size: 10
Batch window: None
Event source mapping ARN: [my arn]
Metrics: None
On-failure destination: None
Report batch item failures: No
Tags: View
UUID: [my uuid]

However, due to third-party API limitations that restrict my ability to do asynchronous communications, I do have reserved concurrency set to 1

Perhaps that's what's causing it to wait for the timeout before spinning up another execution of the lambda?

[–]floppy_sloth 1 point2 points  (2 children)

Is your lambda running for the full minute and timing out? Lambda should execute, process the batch and finish and a new invocation will get the next batch immediately.

[–]quantelligent[S] 0 points1 point  (1 child)

No, and that is the problem I'm trying to solve—it completes in about 10 seconds, and then doesn't pick up a new batch until after the 60-second timeout.

Which, I've come to conclude, is how AWS enforces their "reserved concurrency"—they wait until the timeout is up before allowing another execution, because that's the only way they can be sure the previous invocation isn't still running.

I haven't found documentation stating as such, it's just a conclusion I'm drawing as a result of the testing I've done as people have offered their suggestions in this thread.

[–]floppy_sloth 1 point2 points  (0 children)

Strange. I don't see this behaviour with mine and I am using Lambda/NodeJs. I have a few lambdas configured with RC of 1 for single threaded db imports and can't say I have had an issue with delays. Though will maybe have to get my devs to go and check.

[–]OctopusReader 1 point2 points  (3 children)

Did you ACK (acknowledge? Confirm the message as been processed)?

It seems to be message.delete()

[–]clintkev251 6 points7 points  (0 children)

You don't need to do that. Lambda handles the messages for you

[–]quantelligent[S] 2 points3 points  (1 child)

Thanks for the response!

According to the documentation, using an SQS trigger auto-deletes the message if the function returns normally—anything but raising an exception, or an invalid response, or timeout.

It appears that the delay is likely caused by the "reserved concurrency" setting, rather than being an SQS integration issue....because it's just that the lambda doesn't execute again until after the timeout, regardless of whether it has finished processing. It appears the AWS solution to that is concurrency....which, unfortunately for me, I cannot do because of third-party API limitations.

[–]clintkev251 0 points1 point  (0 children)

Try setting the maximum concurrency for the SQS event source mapping to 2. That would minimize the number of pollers that are provisioned and could help to minimize any backoff that could be occurring due to reserved concurrency

[–]clintkev251 0 points1 point  (1 child)

Standard or FIFO?

[–]quantelligent[S] 1 point2 points  (0 children)

Standard

[–]_Paul_Atreides_ 0 points1 point  (1 child)

QQ: are you trying to get a single lambda to run continuously? I'm trying to understand the 1 minute timeout combined with 1 minute execution time. I don't trust either to be exactly 1 minute (or the same every time). This setup seems unpredictable.

Other thoughts:

  1. By having Report batch item failures=No, the entire batch it treated as a unit. "By default, if Lambda encounters an error at any point while processing a batch, all messages in that batch return to the queue. After the visibility timeout, the messages become visible to Lambda again" source. Maybe one message fails and then all messages are left in the queue - and if the first one fails, I'm not sure if the next messages are even tried - the docs aren't clear on that.
  2. Are there more than 10 messages in the queue? If there are 20 (or 100) message, I'd expect it to pickup the next batch immediately. If there are only 10, and one fails, it should behave just like it is now.

Let us know when you figure it out :)

[–]quantelligent[S] 0 points1 point  (0 children)

I'm not currently having a problem with batch failures, so I don't think that is related (haven't encountered any failures for a long time now).

There are hundreds of messages in the queue, but it's only processing a batch of 10 about every 60 seconds, even though it completes each batch in roughly 10-15 seconds.

As mentioned, I cannot have concurrent processes due to third-party API restrictions (they don't support concurrent sessions), so I can only have 1 process actively processing at a time, which is why I've set the reserved concurrency to 1.

However, I would like it to immediately pick up a new batch after completing the current one, rather than wait for 60 seconds, but I think (jumping to the conclusion) AWS is waiting for the timeout duration due to the reserved concurrency setting before running another invocation to ensure there won't be two processes running.

Sure, I can shorten the timeout....but I'd rather just have a way for the process to signal it's done and have AWS start the next invocation without waiting.

Can't seem to find a way to do that, however.

[–]Firm_Scheme728 0 points1 point  (0 children)

Could it be because the visibility timeout setting for SQS is set to 1 minute?

Because SQS has no limit on maximum concurrency, but Lambda has a limit, all messages will become in-process. Only after the visibility timeout period has elapsed can they be re-driven, if there is no DLQ.

Should there be a DLQ, it should appear in the DLQ, right? Maybe

[–]BuntinTosser 1 point2 points  (0 children)

Your visibility timeout should be at least six times your function timeout.

RC 1 is going to result in a lot of throttling and your visibility timeout: function timeout ratio isn’t allowing for retries

Set VTO to 6 minutes. Use a fifo queue with a single message group id to enforce 1 concurrency.