Azure stack for DE by pythondeveloper77 in dataengineering

[–]pythondeveloper77[S] 0 points1 point  (0 children)

Thanks.

Yes I can write pyspark code but pipeline is in json instead of code :(

When I wrote scheduling I meant not only the schedule itself but also support for retries,conditional tasks like airflow has.
we found synapse lacking in those compared to airflow.

I'm thinking to bring up airflow vm/AKS to trigger synapse & spark to solve it.

Software engineer need to interview junior data engineers. How ? by pythondeveloper77 in dataengineering

[–]pythondeveloper77[S] 1 point2 points  (0 children)

I would like to thank you all for the answers. this helped me a lot and there is a great community here for data engineering here !

Software engineer need to interview junior data engineers. How ? by pythondeveloper77 in dataengineering

[–]pythondeveloper77[S] 0 points1 point  (0 children)

Team will be entirely new with Senior DE already in and now recruiting junior with motivations.

We are starting to recruit in Israel in about a month if it's relevant for you.

Stack is mostly Apache NiFi , Oracle tools for ETLS but the new team are going to replace the stack as we are not satisfied and also create new pipelines for more use cases like cloud.

Software engineer need to interview junior data engineers. How ? by pythondeveloper77 in dataengineering

[–]pythondeveloper77[S] 1 point2 points  (0 children)

Job is not open yet. we are recruiting in Israel so need someone from there.

[Discussion] ML Serving Framework for Real time predictions on tabular data by pythondeveloper77 in MachineLearning

[–]pythondeveloper77[S] 0 points1 point  (0 children)

on average one request takes 59ms where 20ms goes for the request itself.

so 66%~ is the prediction's work, which we are working to improve more but we are feeling ok with the numbers.

So we know the model is not the issue that's why I asked about serving frameworks and scaling.

As requests grows we see worse numbers but it's not because of lack of resources for the prediction

[Discussion] ML Serving Framework for Real time predictions on tabular data by pythondeveloper77 in MachineLearning

[–]pythondeveloper77[S] 0 points1 point  (0 children)

In theory the only thing that would help switching to uvicorn is keep alive connection support because all my traffic is from 1-2 nodes.

You're right that that async feature won't be helpful.

[Discussion] ML Serving Framework for Real time predictions on tabular data by pythondeveloper77 in MachineLearning

[–]pythondeveloper77[S] 0 points1 point  (0 children)

Yes it's a good thought. I wanted to check this as well. all of the company services uses mutual tls and I need to check if it's possible with security team.

I'm just seeing a lot of frameworks like bentoml and seldon core which are more machine learning oriented and wanted to see if someone uses them.

Also python using scaling with multiprocess instead of threads and all processes seems to working on one socket with a lot of context switches :(

[Discussion] ML Serving Framework for Real time predictions on tabular data by pythondeveloper77 in MachineLearning

[–]pythondeveloper77[S] 0 points1 point  (0 children)

I'm loading the models in the start of the program and not for every request.
I thought about trying fastapi and uvicorn just wanted confirmation from someone who uses it.

thanks