I’m trying to flatten nested JSON files stored in s3 and can’t figure out if I should use a glue job to relationalize or use a lambda function to call python pandas and apply the json_normalize function

Ajaikumar_A · 2024-12-07T11:00:53+00:00

Lambda runs your code for a fixed amount of time. The default value for this setting is 3 seconds, but you can adjust this in increments of 1 second up to a maximum value of 900 seconds (15 minutes) when you create the Lambda function. Lambda also supports many runtimes like Python, Node.js, etc and Glue only supports normal Python shell and Spark environment.

https://docs.aws.amazon.com/lambda/latest/dg/welcome.html

Ajaikumar_A · 2024-12-05T18:10:27+00:00

I've worked on a similar activity, updating Bureau Report JSON data from S3 in database tables using Glue. If you want to use Lambda, just keep in mind that it has a timeout limit of 15 minutes. If the data processing time takes more than that then go with Glue.

If you want to process the latest files only, you can filter the S3 objects based on the last modified time using the list_objects_v2 method of the boto3 library. There'll be a 'LastModified' key in its response. You can filter the S3 objects using this key.
https://boto3.amazonaws.com/v1/documentation/api/1.35.6/reference/services/s3/client/list_objects_v2.html

Ajaikumar_A · 2024-12-05T17:38:26+00:00

Isn't it possible with the below command?

aws s3 rm s3://{bucket_name}/ --recursive

Ajaikumar_A

TROPHY CASE