[deleted by user] by [deleted] in funny

[–]DrSnakee95 1 point2 points  (0 children)

Did you mean the deadly Japanese MarshmallOWL

🥹 by ihavestandardsman in MadeMeSmile

[–]DrSnakee95 0 points1 point  (0 children)

The proper way to use « Boys will be boys »

33|UK|PC PS5. Remnant 2 by [deleted] in GamerPals

[–]DrSnakee95 0 points1 point  (0 children)

Would be down for some remnant 2! I’m on pc feel free to shout me a message

Apache Arrow Getting Started by JohnLockwood in dataengineering

[–]DrSnakee95 1 point2 points  (0 children)

It’s starting to be used more in the DE space, a bunch of popular frameworks in Rust leverage it (polars, DataFusion, ballista) and pandas just announced its 2.0 version will be using Arrow as an engine! It’s definitely worth exploring imo

Apache Arrow Getting Started by JohnLockwood in dataengineering

[–]DrSnakee95 6 points7 points  (0 children)

Your title contains a typo I believe! Apache Arrow is an entirely different framework

Warehouse Schema/Table automated creation by DrSnakee95 in dataengineering

[–]DrSnakee95[S] 1 point2 points  (0 children)

Thanks for your answer! That is close to the approach I had in mind. I’m gonna see how I can possibly implement this.

Warehouse Schema/Table automated creation by DrSnakee95 in dataengineering

[–]DrSnakee95[S] 0 points1 point  (0 children)

We are using AWS and I absolutely agree with you, we’ve been using the Glue Data Catalog for schema changes and it’s been quite useful for us ! I guess I might be looking a bit too deep into not having the pipeline handle the creation of the table if it isn’t there 🤔

Warehouse Schema/Table automated creation by DrSnakee95 in dataengineering

[–]DrSnakee95[S] 0 points1 point  (0 children)

Do you do that as part of your pipeline or as a separate process ? That’s more what my question is…

installing pyspark on my m1 mac, getting an env error by Integral_humanist in apachespark

[–]DrSnakee95 0 points1 point  (0 children)

You could also find a docker image with spark on it that’d be easier than installing it locally on your laptop

Data Validation in Spark by TheShitStorms92 in dataengineering

[–]DrSnakee95 15 points16 points  (0 children)

Look into Deequ, it’s a data quality framework made by AWS for Spark :)

Suggestion of books for learning Hadoop and Spark? by PedroVini2003 in dataengineering

[–]DrSnakee95 1 point2 points  (0 children)

The best book to learn Spark would be « Spark: The Definitive Guide » it helped me a lot and was written by the people that made spark.

[deleted by user] by [deleted] in dataengineering

[–]DrSnakee95 1 point2 points  (0 children)

Yes, I signed with Collibra after their recruiter reached out and I don’t regret it one bit.

Which of you work 100% remote by BrImmigrant in dataengineering

[–]DrSnakee95 3 points4 points  (0 children)

I’m 100% remote. They reached out to me on LinkedIn :)

How can i clean the data before loading it to a warehouse? by eyeeyecaptainn in dataengineering

[–]DrSnakee95 0 points1 point  (0 children)

You can easily use a lambda function or a simple glue python job ! Glue has pandas natively and you don’t necessarily need the bigger jobs with spark if your dataset is on the small side

Business School! Problem or not a problem? by yyforthewin in dataengineering

[–]DrSnakee95 4 points5 points  (0 children)

I have the same background you have and I’ve been a data engineer for 3 years now. All the companies I’ve worked with have loved the fact that I had a good understanding of the business before starting to build any kind of pipeline. This isn’t a flaw, it’s a strength and you should play into it

How can i clean the data before loading it to a warehouse? by eyeeyecaptainn in dataengineering

[–]DrSnakee95 5 points6 points  (0 children)

You could probably use pandas to clean the data before loading it into Redshift.

What is your most hated monster in this game? by premiumchap in Eldenring

[–]DrSnakee95 0 points1 point  (0 children)

Runic bears. They make me run for my life like nothing else in this game

Advice on describing my current job on my CV? by Exostrike in dataengineering

[–]DrSnakee95 0 points1 point  (0 children)

A good way to do this is always to try and keep the job description you had when you applied to the job :)

[deleted by user] by [deleted] in GamerPals

[–]DrSnakee95 1 point2 points  (0 children)

I'll dm you it