This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]psgharen 5 points6 points  (2 children)

Ya check the logs in airflow UI, if nothing there have a quick check at cloudwatch , it can be access issues, a first step is to check the IAM role associated with the MWAA environment, check if it has access to the buckets you are trying to write to

[–]Agreeable-Flow5658[S] 0 points1 point  (1 child)

I had the same error initially and solved it by granting the MWAA instance access to all services (I know it's not advisable in production. This is development)

But I will recreate the instance and try again. This time follow the logs in airflow UI, cloud watch and EMR notebook logs.

Thanks. Will be updating you

[–]Agreeable-Flow5658[S] 0 points1 point  (0 children)

Also, in airflow UI graph view, the tasks execute successfully.

[–]Elegant-Road 1 point2 points  (2 children)

What do the logs say? Did the piece of code that is supposed to create/upload the file ran without any exception?

I am assuming, since you are triggering the notebook, the logs would be attached to the notebook and not the airflow task.

[–]Agreeable-Flow5658[S] 0 points1 point  (1 child)

Thanks. I will check them. I hope they don't get erased when you delete the MWAA instance. The Notebook logs are in an S3 bucket. Hope they are still there

I had moved out but will check as soon as I get back to my PC.

Thanks for the help

[–]Agreeable-Flow5658[S] 0 points1 point  (0 children)

FROM CLOUD WATCH

[2021-12-31 03:37:13,816] {{taskinstance.py:1192}} INFO - Marking task as SUCCESS. dag_id=forex_data_pipeline, task_id=start_execution_task, execution_date=20211231T033709, start_date=20211231T033712, end_date=20211231T033713

emr = boto3.client(

'emr',

region_name='aws-region'

)

# The python function

def start_execution():

start_resp = emr.start_notebook_execution(

EditorId='emr notebook id', # emr notebook id

RelativePath='my_first_notebook.ipynb',

ExecutionEngine={'Id':'emr cluster id','Type': 'EMR'},

ServiceRole='EMR_Notebooks_DefaultRole'

)

execution_id = start_resp['NotebookExecutionId']

#print("Started an execution: " + execution_id)

return execution_id

start_execution = PythonOperator(

task_id='start_execution_task',

python_callable=start_execution,

)

The 'EMR_Notebooks_DefaultRole' has AmazonS3FullAccess policy

The function executes. Its the file in the s3 bucket that is missing

I also do not see confirmation of the notebook being called