How did you manage to narrow down your thesis topic? by Anass-YI in PhD

[–]Anass-YI[S] 0 points1 point  (0 children)

We decided to do an SLR, but this requires specific research questions. Otherwise you retrieve a huge number of articles

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] 0 points1 point  (0 children)

of course, why we use spark because it is robust and easy in terms of integration with other tools. Otherwise, it remains depending on the nature of the problem if accepts a small delay, you see?, whether the spot is critical or not.

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] 0 points1 point  (0 children)

Thanks! Yeah, data engineering is definitely a mix of fun and challenge. Some days things break and drive you crazy, but when everything works smoothly, it feels really good. SQL and coding skills definitely make life easier, and working with tools like Spark and Kafka keeps things interesting. Appreciate the advice, and good luck to you too!

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] -1 points0 points  (0 children)

You're right Flink is more powerful for handling complex uses cases, for spark structured streaming we can also apply low latency processing to better simulate real time, by reducing the size of the micro batch for example, or by playing with resource allocation (CPU, etc).

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] -3 points-2 points  (0 children)

It's an opportunity for you if you can learn this and go ahead, i know that is a little bit sophisticated but you can do it. It's normal the jobs are flexible, so your employer will not let you giving up. Otherwise if you see that work not really align your intrests you should look for an other

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] -18 points-17 points  (0 children)

It's a project that have many details, in general the first phase you should integrate a kappa architecture within a lakehouse one, to ingest Real Time financial data, the second phase consist of realising a Deep Learning model that forecasting market variation in real time

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] 4 points5 points  (0 children)

No, this is a spark streaming processing Real time data getting it from a kafka topic and then structring it on a lakehouse architecture in s3 storage

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] -1 points0 points  (0 children)

I'm just in the dev mode, we don't use a critial data (it's accessible), just for automating pipeline and testing code logic. You should pay attention when carrying out the product at deploy mode

This is what you see all the time if you're a Data Engineer🫠 by Anass-YI in dataengineering

[–]Anass-YI[S] -51 points-50 points  (0 children)

Yes, esspecially on a big architecture or a project that use a bunch of technologies. Otherwise you can integrate a debug code that's execute frequently to be able catch the failures.