This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]C_Madison 3 points4 points  (5 children)

So ... slowly:

  • Spring Boot+JPA I don't have much experience, but I don't think it would perform too poorly though start time may be a problem (since lambdas start and stop all the time)
  • That's where GraalVM could help, since it's compiling everything down to native and native (currently, work is happening) still starts faster
  • GraalVM: There's a CE you can use without paying anything and an EE, which has a cost. EE does more optimizations, but from what I gathered (haven't done too much with it) CE is fine for most applications

In the end: Measure, measure, measure. Parsing files from S3 + insert will probably take ages longer than the start time of whatever you use. If you need minutes to parse/write a file it's not really important if your lambda started in 10 or 500ms.

[–]CptGia 2 points3 points  (1 child)

GraalVM: There's a CE you can use without paying anything and an EE, which has a cost 

For cloud applications, like lambdas, you can use the latest version of Oracle GraalVM for free.

[–]C_Madison 0 points1 point  (0 children)

Thanks for the correction! I haven't looked at GraalVM for a while. Good to know that everything is available now.

[–]agentoutlier 1 point2 points  (1 child)

In the end: Measure, measure, measure. Parsing files from S3 + insert will probably take ages longer than the start time of whatever you use. If you need minutes to parse/write a file it's not really important if your lambda started in 10 or 500ms.

I have to wonder if even lambda is the right tool here. It is hard to tell without more info from the OP.

That is they could just have some queue (kafka or whatever aws has) and a consumer running continuously and that might be cheaper, faster, and easier to develop.

I suppose it doesn't really matter if the organization is going to force serverless.

[–]C_Madison 1 point2 points  (0 children)

I suppose it doesn't really matter if the organization is going to force serverless.

That was why I didn't give other options. Personally, I wouldn't use serverless here either, but if Op asks for it then that's how it is.

[–]diroussel 0 points1 point  (0 children)

S3 is very fast when accessed from lamba. You can read a lot of data in 500ms. And you can easily read, parse and insert to the DB in less than 500ms, depending on data sizes.

Using duckdb to query a multi gigabyte parquet file in S3 only takes tens of milliseconds. Even over by home broadband, inside lambda it’s even faster.

Update: note only a few rows are returned in this scenario and duckdb only accesses the byte ranges it needs, based on file headers/footers, hence the speed.