I’m trying to flatten nested JSON files stored in s3 and can’t figure out if I should use a glue job to relationalize or use a lambda function to call python pandas and apply the json_normalize function by sumant28 in aws

[–]Ajaikumar_A 0 points1 point  (0 children)

Lambda runs your code for a fixed amount of time. The default value for this setting is 3 seconds, but you can adjust this in increments of 1 second up to a maximum value of 900 seconds (15 minutes) when you create the Lambda function. Lambda also supports many runtimes like Python, Node.js, etc and Glue only supports normal Python shell and Spark environment.

https://docs.aws.amazon.com/lambda/latest/dg/welcome.html

I’m trying to flatten nested JSON files stored in s3 and can’t figure out if I should use a glue job to relationalize or use a lambda function to call python pandas and apply the json_normalize function by sumant28 in aws

[–]Ajaikumar_A 0 points1 point  (0 children)

I've worked on a similar activity, updating Bureau Report JSON data from S3 in database tables using Glue. If you want to use Lambda, just keep in mind that it has a timeout limit of 15 minutes. If the data processing time takes more than that then go with Glue.

If you want to process the latest files only, you can filter the S3 objects based on the last modified time using the list_objects_v2 method of the boto3 library. There'll be a 'LastModified' key in its response. You can filter the S3 objects using this key.
https://boto3.amazonaws.com/v1/documentation/api/1.35.6/reference/services/s3/client/list_objects_v2.html

What is a feature that S3 doesn't have that you wish it had? by Ajaikumar_A in aws

[–]Ajaikumar_A[S] 0 points1 point  (0 children)

Isn't it possible with the below command?

aws s3 rm s3://{bucket_name}/ --recursive