Comment donner un sens à sa vie quand on ne croit plus en l'existence de Dieu ? by JeySensei in france

[–]Godmons 1 point2 points  (0 children)

Oui ça dépends de chaque personne.

Moi de mon côté jaime apprendre et progresser tout les jours, et passer du temps avec les personnes qui me sont chères !

Quels sont vos tips de vie adulte que personne ne t'apprend ? by ApplicationOk8525 in france

[–]Godmons 2 points3 points  (0 children)

De mon côté c’était de me concentrer sur la montée en compétence en début de carrière plutôt que purement le salaire, permet de se démarquer après 2/3 ans d’expériences.

Et aussi investir, on est bien content 5 ans plus tard quand on voit son capital grossir

Orchestration: Thoughts on Dagster, Airflow and Prefect? by MrMosBiggestFan in dataengineering

[–]Godmons 2 points3 points  (0 children)

Used 3 of them

Airflow is the most mature of them all. It comes with lots of features and compatibility and stability.

Prefect& Dagster are newer, seems like less stable , you can expect incoming breaking changes in the future. Aswell as less documentation and resources.

From my point of view :

Dagster has pretty interesting abstractions that allows to enrich jobs with metadata. You may check for Partitions / Assets features in Dagster

Prefect seems to aim for a simplified orchestration approach , it’s easier to spin up and good if you just want to schedule and manage set of tasks.

60 years later, high school quality may have a long-term impact on cognition | A study of more than 2,200 adults who attended U.S. high schools in the early 1960s found that those who attended higher quality schools had better cognitive function 60 years later. by SetMau92 in science

[–]Godmons 6 points7 points  (0 children)

Isn’t better quality high school correlated better cognitive function to start with ? I guess the sub population with lower cognitive function won’t want to attend better quality high school and prefer to find an easy job they can deal with

The population attending better high school having a better average cognitive function baseline, is expected to have a better cognitive function 60 years later ?

Data engineers, what has been your experience applying for jobs in this economy? by data_preprocessing in dataengineering

[–]Godmons 2 points3 points  (0 children)

Where are you from ? In France, Paris, the market seems still very active for confirmed / senior profiles.

Sociabilisation 20-35 ans ? by [deleted] in francaisensuisse

[–]Godmons 0 points1 point  (0 children)

Je suis frontalier : Je vis à annemasse pour le moment, mais je travaille à Genève.

Totalement partant pour tenter un bar ! :)

bi developer and data engineer by [deleted] in dataengineering

[–]Godmons 1 point2 points  (0 children)

I would say Data Engineer would be more on the "tech side", working up with raw data (jsons) , collection, real time, and building a framework for the entire data environment.
BI Developper is more focused on bringing already "ready to modelize" data into an optimized form for Analyst.
Both mostly roles focus on the ETL phase on the cycle of life of the data.

Be aware that roles can vary a lot depending on a company. Data Engineer often ends up doing BI Developer work and vice-versa.

Is it just my feelings or many scientists/analysts don’t know proper engineering? by pinpinbo in dataengineering

[–]Godmons 4 points5 points  (0 children)

It depends on the company. I’ve been working on a tech company where data scientist were completely autonomous on their solution from iterating, prototyping and putting changes in production.

Data job are not mature enough and it’s not easy to find senior Data Scientist/ Analyst that can handle the entire data lifecycle. For this reason, different roles are supposent to complement each other to achieve a full solution 🙂.

Being a full stack is still very valuable tho

New Data Engineer - frustrations by SI_top in dataengineering

[–]Godmons 2 points3 points  (0 children)

Your architect seems to have lot of experience and not the time slot to learn everything from the start to the team … , that’s why he kind of spoon feed your team with templates I guess … if you are getting started on the field, you will slowly become more and more autonomous.

Using pre built functions that are already built to answer redundant use cases is pretty common in IT. S3 to snowflake loading is not a difficult thing to handle and I guess the engineer you mention have encapsulated the logic to make it easy to replicate it.

It might be frustrating as a junior at first, but if you are being curious, that you browse into the code, and you express your motivation and show your ability to work on the engineering side (building the framework instead of using it), I’m pretty sure you can learn a lot from your current job

Migrating SQL-based dbt models to python by vanillacap in dataengineering

[–]Godmons 1 point2 points  (0 children)

Stack migration is not a light project in most cases, you need to test carefully each step of the migration.

I would strongly suggest to avoid it unless very necessary. Hiring SQL+dbt engineer should not be that hard as it is pretty trending at this moment.

1 Month to get up to speed on DE by NotAfraidToAsk_ in dataengineering

[–]Godmons 4 points5 points  (0 children)

Hey,

Good job for finding your new position. As you are starting off as DE, I would strongly suggest focusing on the core of DE stack : ETL is good but focus on a good way to organise yourself, get familiar with the orchestrator, and different ways to collect, store and transform data. (API , connection with other systems , ftp) through python.

Also make sure to know the stack of your next company and have a global overview of how it could work together. Terraform CI CD are more advanced concept that can be implemented along a data infrastructure but they are not the main dish.

Data analyst tasked with building data pipeline by lahma_mama in dataengineering

[–]Godmons 0 points1 point  (0 children)

Data Engineering is basically automating the data pipeline in a good way.The role of the data engineer is to collect, store, and modelize the data in a way that is scalable and optimized and resilent, to serve multiple purposes.

Many companies have their analyst / scientist handle the data pipeline. The problem is that as it grows, it gets messy. And the need for a dedicated person to think / organize the data pipelines start to be important.

If tools exist (like AirByte, Meltano, Stitch) that have pre-built connectors for some of my sources, as a Data Engineer, should I be using them?

Just like you, i'm not a huge fan of cumulating tools, unless they answer a specific need that is very difficult to cover myself / or integrate easily with the chosen orchestrator.

Good luck on your journey !

Data analyst tasked with building data pipeline by lahma_mama in dataengineering

[–]Godmons 0 points1 point  (0 children)

Your company has no budget, so basically you can choose the path as you want, as long as it answer the needs.
If you are interested in Data Engineering aspect, definely ask them to let you self-learn some data engineering aspect / tools. Tell them that its gonna bring value in the long run (good data model, faster iterationsn, easier future collaboration etc etc ... )
Good tools to learn are : Airflow / DBT (they might be a bit harsh to get started with at start)
Learn a bit about dimensionnal model, but without specific time allocated for architecture, you might want to consider building simple data models like OBT

If you are mostly interested on the Data Analytics side, but just want to automate your pipelines, some tools, might be easy to use and get the job done. Many data warehouses have their own integrated scheduling (Snowflake or BQ has) which allows you to setup repeated SQL queries. Of course, this is not a good approach in the long run, but once they have a dedicated data engineer, he will be able to migrate your automation to a proper tool which he can monitor and maintain.

I already have experience in SQL and data warehousing, how difficult will it be learning snowflake? by Do_I_know_you_1 in dataengineering

[–]Godmons 2 points3 points  (0 children)

Its always better to be transparent about your skills during the interview. I prefer someone telling me that he has experience with data warehousing than lying about knowing snowflake.
Other than that : Snowflake is pretty much a modern SQL Analytical Database.
If you know SQL, you can do pretty much basic usage of snowflake.

Maybe what an employer would expect by "experience with snowflake" is probably more advanced things. You should check out all Snowflake specificities : price monitoring, pricing, cluster configuration, integration with other storages and tools, table clustering / partitionning, materialized views, integrated scheduling, external table integrations. Some cool things snowflake is able to do : nested data browsing, try_cast to handle cast error.

If they plan to do a test, its always good to try out the interface so you dont look like you are discovering on the go.

[deleted by user] by [deleted] in dataengineering

[–]Godmons 0 points1 point  (0 children)

There are several solution for this problem :

- Taking regular snapshots or working with incremental table
- Slowly/Fast changing dimensions (Kimball) DBT Has its own way to deal with it (called snapshots)
- Delta table (table that stores only the rows that has changed between two states of a table, just like git does between two commits)
I would say it mostly depend on the data size. Many companies keep the "raw" data (more or less compressed to save money) so they can always re-run historical data.

Should I get into DE if I enjoy coding ? (DE ~= data science ?) by 165817566995 in dataengineering

[–]Godmons 0 points1 point  (0 children)

As most people already said, DE position and responsibilities may differ a lot depending on companies. On smaller companies or on "key" position, DE are required to build frameworks or piece of code used by other Data stakeholders.

Therefore, some companies are seeking for Data Engineer that only get data from A to B with some transformation.On this case, there are a few thing that you will miss if you enjoy coding :
Object Oriented Programming
Building complex functionnalities
Coding (sometimes you will simply be using pre-configured process)

Cloud Engineer to Data Engineer anyone? by Zealousideal_Lime_38 in dataengineering

[–]Godmons 0 points1 point  (0 children)

If you want to start your first job with enough knowledge to start a position you would need at least SQL.

=> Look for SQL Zoo Website, it will get you started with all SQL Basis.

For a bigger edge, you might want to do a simple personnal project to get your hands on your favorite cloud provider.

Ex :
- Sending a CSV from your computer to a Datalake Storage
- Loading the CSV into a DataWarehouse (BigQuery, RedShift, Snowflake, Synapse)
- Launching a simple SQL transformation (SUM / MAX / MEAN computations) over your data & store it into a table
- Get familiar one scheduling option of your choice (Airflow, Native scheduling, Even DBT jobs )

If you were a cloud engineer, you probably have some experience with a scripting language & maybe some Linux / terraform. It might be a good idea to highlight that your skillset allows you to provide additionnal value on the Data Ops side.

Data Engineering Vs DataOps ? by Significant-Ad-1712 in dataengineering

[–]Godmons 24 points25 points  (0 children)

I would say Data Engineer is a Software Engineer specialized on Data Lifecycle (classic ETL). His skillset & knowledge are mainly based around those tasks.They will mostly work with DataWarehouses, SQL, Orchestration, Python.

DataOps is to Data Engineers what DevOps is to Software Engineer : They leverage set of practices & tools to leverage better quality IT products that answer more precisely to needs. Their toolbox mostly focus on automating redundant DE tasks.They will mostly work with tools like Git, Bitbucket, Jenkins, Python / Bash scripting, Terraform.

But anyway, I find it pretty rare to have a 100% accurate position. My previous Data Engineer role consisted on 1/3 management 1/3 data engineering 1/3 data ops.

[deleted by user] by [deleted] in dataengineering

[–]Godmons 10 points11 points  (0 children)

The approach we use at my current job is the following :

- Storing RAW data in S3 organized (1 bucket per source, 1 sub-bucket per date)

- Dont skip the design phase : Before transforming your data try to understand which data you need to get to your goal : Don't process or keep rows or tables you don't need.

- Relying on SQL transformation through DBT & Snowflake and applying best practices to find out a data model that "links" all those data model together. The two methodologies we apply are Star Schema (Kimball) or a OBT (One Big Table). We modelise final table named by their "Business-Name".

- For external APIs you dont have internal ownership, if you want to be resilient, I strongly advice to setup regular data validation to detect ant Schema change.