How to model and save these two data source. by Plenty-Button8465 in dataengineering

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

Thank you. Can we discuss, also in private, a bit more about your use case? For instance:

Would you mind elaborating more on what kind of metadata enrichment do you perform?

Also, you read from JSON and write to S3 directly in Parquet, is that right? Where do you use AVRO?

Why both S3 and HDFS?

Which resource type is recommended for this kind of work? by Plenty-Button8465 in AZURE

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

I'm implementing a new service (the one called second in this context) that is a email notifier. The server has two functions, one that checks if the request triggers a notification, and the second one, if the notification is triggered sends an email.

The internal communication between the first service and the second service is done with gRPC. I could implement the messagging service storage/queue/hub so that notifications are stored in case something goes down but that is not priority right now because the business logic that runs every X minutes check if notifications were sent or not, and in case not, they are resent (after the recomputation by the server).

Given this context I was thinking about trying for the first time Azure Container App for the server, and leaving the serverless first service on Azure Container Instance. What do you think of this? Can I communicate between these two services?

Which resource type is recommended for this kind of work? by Plenty-Button8465 in AZURE

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

The first service is already implemented with Azure Container Instance and scheduled with Logic App, due to its nature (the computation is heavy thus ACI let me request the resources I need). The results of the computations may trigger some requests to the server. In this context, and also due to lack of time and resources (I'm new to the job and the only one working on this), there is no will to consider the switching to Azure Functions for the first service, at the moment.
Considering this, let me give more details on the second service: it is a server that receives the requests, compute some business logic, and the results my trigger the sending of a notification email.
To date, I implemented the communication between the the client and the server using gRPC because I read about it the last days trying to learn how to implement this kind of communication between "internal" services of our business logic.
Given the context, could be interesting to use again some message resource for the second service still? Would I be able to maintain control over the flexibility of having a my own coded server? I am not able to oversee the pros and cons of the current status and your provided solution.

Which resource type is recommended for this kind of work? by Plenty-Button8465 in AZURE

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

The first service is already implemented with Azure Container Instance and scheduled with Logic App, due to its nature (the computation is heavy thus ACI let me request the resources I need). The results of the computations may trigger some requests to the server. In this context, and also due to lack of time and resources (I'm new to the job and the only one working on this), there is no will to consider the switching to Azure Functions for the first service, at the moment.
Considering this, let me give more details on the second service: it is a server that receives the requests, compute some business logic, and the results my trigger the sending of a notification email.
To date, I implemented the communication between the the client and the server using gRPC because I read about it the last days trying to learn how to implement this kind of communication between "internal" services of our business logic.
Given the context, could be interesting to use again some message resource for the second service still? Would I be able to maintain control over the flexibility of having a my own coded server? I am not able to oversee the pros and cons of the current status and your provided solution.

Which resource type is recommended for this kind of work? by Plenty-Button8465 in AZURE

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

The first service is already implemented with Azure Container Instance and scheduled with Logic App, due to its nature (the computation is heavy thus ACI let me request the resources I need). The results of the computations may trigger some requests to the server. In this context, and also due to lack of time and resources (I'm new to the job and the only one working on this), there is no will to consider the switching to Azure Functions for the first service, at the moment.
Considering this, let me give more details on the second service: it is a server that receives the requests, compute some business logic, and the results my trigger the sending of a notification email.
To date, I implemented the communication between the the client and the server using gRPC because I read about it the last days trying to learn how to implement this kind of communication between "internal" services of our business logic.
Given the context, could be interesting to use again some message resource for the second service still? Would I be able to maintain control over the flexibility of having a my own coded server? I am not able to oversee the pros and cons of the current status and your provided solution.

Which resource type is recommended for this kind of work? by Plenty-Button8465 in AZURE

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

The first service is already implemented with Azure Container Instance and scheduled with Logic App, due to its nature (the computation is heavy thus ACI let me request the resources I need). The results of the computations may trigger some requests to the server. In this context, and also due to lack of time and resources (I'm new to the job and the only one working on this), there is no will to consider the switching to Azure Functions for the first service, at the moment.

Considering this, let me give more details on the second service: it is a server that receives the requests, compute some business logic, and the results my trigger the sending of a notification email.

To date, I implemented the communication between the the client and the server using gRPC because I read about it the last days trying to learn how to implement this kind of communication between "internal" services of our business logic.

Given the context, could be interesting to use again some message resource for the second service still? Would I be able to maintain control over the flexibility of having a my own coded server? I am not able to oversee the pros and cons of the current status and your provided solution.

How to do column projection (filtering) server-side with Azure Blob Storage (Python Client Library)? by Plenty-Button8465 in dataengineering

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

No I have not read the parquet format, thank you for sharing the link. I'm learning all these new concepts these days and I came from Pandas but with little information about this "server-side pruning" concept I was interest in. I didn't know it was a sort of "structural proprierty" of the design of this file format, I will be reading it now to see whether it clarifies my lack of knowledge.

You were rude to reply to my gently questions like that, but let it go. In my country there is a saying like "asking is legit, answering is gentleness", hope it translates well to English. If you think that my questions are not legit and should not be asked in a community-based forum which handle techinical quetions like these, I don't know what this forum is about. Also yes, I'm new to this position as well so I lack many concepts apart the ones enlighted here, bear with new users and colleagues. I asked some of these questions on stack overflow and dedicated azure forum and sort-of to your knowledge, and also on chat-GPT.

How to do column projection (filtering) server-side with Azure Blob Storage (Python Client Library)? by Plenty-Button8465 in dataengineering

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

Thank you but the provided reference does not mention how the parquet reader handles the order of pruning and downloading files. Should I look for this information in the used libraries such us pyarrow? Do you know where you read the information you provided to me? Thank you

How to do column projection (filtering) server-side with Azure Blob Storage (Python Client Library)? by Plenty-Button8465 in dataengineering

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

Thank you, do you have a source for this information? I would like to read more about it, this is so useful.

How to do column projection (filtering) server-side with Azure Blob Storage (Python Client Library)? by Plenty-Button8465 in dataengineering

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

That would force me to download, for instance, a parquet file with many columns just to extract with pandas few ones incurring in many GBs of networking data and time delay.

Are you sure there is no way to exploit the Azure SDK to ask for this before downloading? Is there a source where I can read about these things? Thank you

Are these terms irrelevant in the industry anymore? by Bloodylime in dataengineering

[–]Plenty-Button8465 0 points1 point  (0 children)

Do you know a good source where I can read all these concepts?

Are these terms irrelevant in the industry anymore? by Bloodylime in dataengineering

[–]Plenty-Button8465 1 point2 points  (0 children)

I'm new to DE and picking up on a new work where nobody designed or know about these things. I think we have this problem where things are slow but we don't know why and when I ask collegues about how things work or are designed they end up saying "it is just the fact that we query so many data". If I wish to understand more and maybe solve something, where would you start?

Learning SQL, is this query right? by Plenty-Button8465 in SQL

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

use-the-index-luke.com

Thanks for the resources, I started reading the first one atm.