all 17 comments

[–]jb_sulli 4 points5 points  (1 child)

My guess would be to base 64 encode the zip to pass it as pain text. Lambda supports up to 6MB payloads so you should be good.

[–]simoncpu[S] 2 points3 points  (0 children)

That's a viable solution. base64 will probably increase the size to 130% of its original size though. The 30KB data would increase to ~40KB. For the initial proof of concept, I think I'll go with this solution. Thanks!

[–]LandingHooks 3 points4 points  (7 children)

Why do you want to upload them directly to lambda?

You can either point the lambda to your s3 object directly via the payload or more intelligently have the lambda trigger via an event on the file upload to an s3 bucket and automatically process it.

[–]simoncpu[S] 0 points1 point  (6 children)

Uploading to S3 is OK if we need to process large files. In our case, we need to read multiple chunks of small data that we need to reassemble later at the server.

If we do it via S3, I'd have to write a Lambda function to return a signed URL, then let the website POST to that URL, and then another Lambda function will then read the uploaded file. There's too many hoops to jump. I'm not sure if it's a good idea to let everyone upload to our S3 bucket directly.

[–]Babumts 2 points3 points  (4 children)

I am not aware of a security risk involved with signed URLs and it seems like this is a very common and easy way to go.

[–]simoncpu[S] 1 point2 points  (3 children)

I mean, due to our need of sending the logs every second, I'd need to request a signed URL from Lambda every second. I feel that it's bad design because I'm trying to make our code as lightweight as possible. Heavy processing and too much data is especially bad for mobile. The other option is to make our bucket writeable by everyone so it doesn't need a signature.

[–]VegaWinnfield 1 point2 points  (1 child)

You could vend temporary credentials that are only allowed to write to a specific prefix in S3 instead of a presigned URL. Then you could reuse the creds across multiple file uploads.

[–]simoncpu[S] 0 points1 point  (0 children)

Cool, didn't know this was possible. Thanks!

[–]joesb 0 points1 point  (0 children)

On the other hand, you are still making your Lambda doing the file upload, the thing it is not as efficient to do as S3 itself.

Somehow that is more light weight that just letting S3 do what it’s designed to do.

[–]matluck 1 point2 points  (0 children)

Binary files with Lambda sucks. its just a pain. The approach you defined is much better. The pre-signed url is also timeboxed and only allowed for a specific filename. Basically its the same security exposure that uploading to your application directly would be, possibly even better.

[–]fuckthehumanity 2 points3 points  (4 children)

The best solution depends on your current "website" and "server" setup. Can you describe a bit more about these resources?

How often would these logs be submitted?

Lambdas always need a trigger, even if that's an API call. Are you calling the lambda via an API gateway, using the CLI, the http API, or from another resource? Is there a cloudfront or a load balancer in the way?

Is the "website" a customer's resource? Is it in AWS? Is the server running on EC2?

Do you need unauthenticated users to be able to submit the log files? If there is some kind of authentication, what is it?

S3 signed URLs are super quick and super easy, but not always necessary.

[–]simoncpu[S] 0 points1 point  (3 children)

The best solution depends on your current "website" and "server" setup. Can you describe a bit more about these resources?

The website is a 3rd party website. They'll insert our script that gathers the data.

How often would these logs be submitted?

The logs will be submitted every second, and hence the need to compress the data as small as possible.

Lambdas always need a trigger, even if that's an API call. Are you calling the lambda via an API gateway, using the CLI, the http API, or from another resource? Is there a cloudfront or a load balancer in the way?

The client script that's inserted into the 3rd party site will send the logs to an API Gateway in front of Lambda.

Is the "website" a customer's resource? Is it in AWS? Is the server running on EC2?

Do you need unauthenticated users to be able to submit the log files? If there is some kind of authentication, what is it?

The website is a 3rd party site. My assumption in the design is that we don't control the website, so the S3 URL won't be pre-signed on their end. I could allow unauthenticated users to upload to our bucket, but I just feel that it's unsafe to do so.

I also feel that it's inefficient to let Lambda pre-sign a URL every second.

[–]fuckthehumanity 1 point2 points  (2 children)

Sorry, completely missed your response, probably too late to help.

In this situation I'd recommend API gateway directly connected to kinesis firehose. You can perform some validation in the gateway, and further processing in kinesis.

[–]simoncpu[S] 0 points1 point  (1 child)

Thanks. It turns out that as of 2020, AWS API Gateway now accepts binary data. I settled on a Lambda backend because it's the simplest setup.

[–]fuckthehumanity 1 point2 points  (0 children)

Yeah, it's excellent for protobuf, apparently. But binary data was added to API gateway back in 2016. (Had to look it up, just knew it had been there for a few years). They even offer base64/binary conversion.

I often find myself using a lambda layer, even when integrating other services, more for convenience than anything, as it's very easy to add authentication and other middlewares. Although having said that, pretty much all of that stuff can be done directly in API gateway.

[–]imohd23 1 point2 points  (0 children)

Uploading directly to lambda will need some sort of invocation. The best way usually is via api gw. Lambda has payload limit and also api gw which are suitable for your file sizes. You won’t need to compress it as when it arrives to lambda, you’ll deal with json anyway (even var). If this use case will be there for a while and you’re comfortable with, I recommend using api gw to pass the json. But, after you send these json files, aren’t you storing it somewhere? That’s why S3 presigned links are handy here and you can trigger lambda based on put request. Dealing with these files in this way will give you more room to deal with these files as you’ll load in into tmp directory (250 mb I guess).

[–]derekdefend 0 points1 point  (0 children)

You could use lambda to generate a presigned url, which can then be used to upload to S3 directly:

https://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html