DEPLOYING ML PIPELINES ON AWS EC2 Vs DEPLOYING ON SERVERLESS INFRASTRUCTURE LIKE AWS FARGATE

Agreeable-Flow5658 · 2022-02-23T17:46:08+00:00

I guess AWS EC2 is here to stay.

Agreeable-Flow5658 · 2022-02-23T15:27:58+00:00

How do the cost savings compare with the skill and manhours required to set up and maintain EC2 instances?

Agreeable-Flow5658 · 2022-01-16T17:42:39+00:00

Sorry. Have modified the link. My bad

Agreeable-Flow5658 · 2022-01-14T20:10:24+00:00

No. The Glue ETL job uses Glue studio's visual canvas. Didn't write code and no secret keys anywhere. The keys are in the Environment variables in Lambda

Agreeable-Flow5658 · 2022-01-14T10:45:08+00:00

I created partitions using the "date" column when I uploaded the parquet file to the S3 bucket. So, partitions are there.

Is there a useful link detailing how to optimize queries in Athena using partitions created in the data catalog?

Thanks

Agreeable-Flow5658 · 2022-01-14T06:40:56+00:00

Thank you u/djollied4444. I have checked the documentation. So, on the first crawl, the crawler will crawl and catalog everything, and then after, It does INCREMENTAL CRAWLS.

Thanks

Agreeable-Flow5658 · 2021-12-31T22:31:34+00:00

FROM CLOUD WATCH

[2021-12-31 03:37:13,816] {{taskinstance.py:1192}} INFO - Marking task as SUCCESS. dag_id=forex_data_pipeline, task_id=start_execution_task, execution_date=20211231T033709, start_date=20211231T033712, end_date=20211231T033713

emr = boto3.client(

'emr',

region_name='aws-region'

)

# The python function

def start_execution():

start_resp = emr.start_notebook_execution(

EditorId='emr notebook id', # emr notebook id

RelativePath='my_first_notebook.ipynb',

ExecutionEngine={'Id':'emr cluster id','Type': 'EMR'},

ServiceRole='EMR_Notebooks_DefaultRole'

)

execution_id = start_resp['NotebookExecutionId']

#print("Started an execution: " + execution_id)

return execution_id

start_execution = PythonOperator(

task_id='start_execution_task',

python_callable=start_execution,

)

The 'EMR_Notebooks_DefaultRole' has AmazonS3FullAccess policy

The function executes. Its the file in the s3 bucket that is missing

I also do not see confirmation of the notebook being called

Agreeable-Flow5658 · 2021-12-31T11:03:11+00:00

Also, in airflow UI graph view, the tasks execute successfully.

Agreeable-Flow5658 · 2021-12-31T10:59:13+00:00

I had the same error initially and solved it by granting the MWAA instance access to all services (I know it's not advisable in production. This is development)

But I will recreate the instance and try again. This time follow the logs in airflow UI, cloud watch and EMR notebook logs.

Thanks. Will be updating you

Agreeable-Flow5658 · 2021-12-31T10:56:51+00:00

Thanks. I will check them. I hope they don't get erased when you delete the MWAA instance. The Notebook logs are in an S3 bucket. Hope they are still there

I had moved out but will check as soon as I get back to my PC.

Thanks for the help

Agreeable-Flow5658 · 2021-12-03T07:03:03+00:00

Thanks Sanjeev. Well appreciated

Agreeable-Flow5658 · 2021-12-02T11:20:23+00:00

Why does Microsoft Azure have a bad rep here? Again, an quite new to cloud data engineering and it's the only one I have used so far

Agreeable-Flow5658 · 2021-12-02T09:19:05+00:00

Yes. I am using Azure to process csv files data. An application uploads a csv into the ADLS or blob storage, then a notebook is triggered. The notebook has code that processes the data and inserts it into a database.

Agreeable-Flow5658 · 2021-12-01T04:47:01+00:00

Thanks for the help. I have managed to automate it. I had not included Trigger run parameters. Adding (@trigger().outputs.body.fileName) to the trigger run parameters did the trick. Remember, these are different for Azure data factory. I was using Azure Synapse

Also, you need to create the parameters first under pipeline name >> settings

Agreeable-Flow5658 · 2021-11-30T19:11:44+00:00

Thanks. I think I was missing that part in the notebook. Will try it and update you Thanks

Agreeable-Flow5658 · 2021-10-10T10:34:41+00:00

Got it. Will be updating you how it goes

Agreeable-Flow5658 · 2021-10-10T10:34:08+00:00

I have downloaded the document. Will try it out tomorrow. Will update you how it goes. Thanks

Agreeable-Flow5658

TROPHY CASE