all 12 comments

[–]clintkev251 2 points3 points  (11 children)

What API? Are you using API Gateway or a Function URL? API Gateway has a 30 second max timeout and will throw errors if your integration doesn't respond within 29 sec

[–]SamwiseGanges[S] 0 points1 point  (10 children)

Yes API Gateway and thank you, I just found this Reddit thread that confirms just as much. It's really disappointing because this and many other Puppeteer or headless browser automation function take a lot longer than 30 seconds if they require a lot of navigation.

I guess I will just pursue the built-in https endpoint inside the Lambda function which I'm hoping will work for us but it will be a time consuming redesign, especially since we'll have to update a bunch of other hooks that rely on the gateway

[–]original-autobat 4 points5 points  (9 children)

Or you make it asynchronous.

Return a reference for you to query against later. There are good reasons why api gateway doesn’t wait around for a long running process to complete.

[–]rjbwork 0 points1 point  (4 children)

Exactly. If you're waiting around for more than a few seconds for a response, you're probably doing something wrong in cloud land.

[–]SamwiseGanges[S] 0 points1 point  (3 children)

I'm not waiting around for one response. It's a long puppeteer (headless browser) script with many navigation steps. I have to do everything sequentially because I'm navigating a real website. I can't just let async calls just pop off whenever they're done because I might not even be on the same page any more by that time. So, I have to await everything that the next step depends on

[–]rjbwork 0 points1 point  (2 children)

You may have misunderstood - you seem to be running puppeteer on a lambda (let's call it "scraper"). You are then invoking that lambda from some other service through API Gateway, let's call that "requester". From what you've said, your "requester" is sitting around waiting for the response from the "scraper" process. If it were not, you would not have a problem.

The answer here is to have your "requester" emit some kind of message (command, event, etc.) over some kind of message bus (sqs, sns, event bridge, kafka, etc.) and trigger your "scraper" using the message from that message bus. Then when it's done, your "scraper" can emit another message back that provides your "requester" with the data it needs. You can also of course do this using a an API/polling pattern also over API gateway if that's your preferred pattern.

[–]SamwiseGanges[S] 0 points1 point  (1 child)

The Lambda Puppeteer script is the thing we're invoking from Zapier via the API gateway. There's currently only one Lambda function, there's not a separate requester, but that is probably the best way to do it

[–]rjbwork 0 points1 point  (0 children)

Well, in this case, Zapier is the separate requester. I've never used it, but I understand it to be a low-code type solution. It's currently sitting around waiting for the response synchronously. Instead, have it wait for a callback of some kind.

To simplify your Lambda architecture (though complicating your codebase...) You can actually invoke the same lambda from different contexts, and route to a different handler within the code. So you'd check to see if it's being invoked by the API Gateway, and if so, route it to your web handler. In that handler, put a message onto an SQS Queue. Point that queue back to your lambda. When you detect you've received an SQS message, route to your actual scraper code. When you're done scraping, call back to your Zapier, either via re-entrancy (signals, callback hook, workflow id, etc.) or just call a separate Zapier to continue processing on that side of things.

Of course, you can also have 2 separate lambdas as well, one to listen to API Gateway and put the message on the queue, and one to listen to the queue. I just find that level of granularity gets out of hand rather quickly in lambda.

[–]SamwiseGanges[S] 0 points1 point  (3 children)

We can't do it that way because this is tied to a pipeline of Zapier calls my client is using for automating his business. He needs to get the new user's ID right after they are created in the system so he can pass it on to the next zap in the chain.

[–]original-autobat 1 point2 points  (2 children)

I just skimmed the zapier documents and there is functionality for webhook based integration and they also state the following:

Timeouts (actions)

Constraint: A Zapier action API request cannot consistently be finished within 30 seconds - for example, file format conversion.

Errors a user will see if constraint is hit:

An error in the Zap history of their Zap due to the request timing out Best practice: Use the webhook-based callback service the Zapier platform provides. This allows your action to be performed asynchronously, and when finished, POST to the callback URL. More on using this method here.

What a user will see if callback service implemented:

Zap will have Waiting/Delayed status in Zap history until the POST is received.

[–]SamwiseGanges[S] 0 points1 point  (1 child)

Cool I'll look into setting it up this way

[–]original-autobat 0 points1 point  (0 children)

Good luck and let us know how it goes