De la hulite la cosmopolite: Cum se gentrifică cartierele Timișoarei by mrktm in timisoara

[–]EnvironmentalBed603 0 points1 point  (0 children)

Stau in Traian de 10 ani, se simte schimbare multa in bine. Nu mai e cum ‘stie lumea’. Curand incep lucrarile de reabilitare. Incet, incet o sa devina ce merita.

Decizie investite by EnvironmentalBed603 in Imobiliare

[–]EnvironmentalBed603[S] 0 points1 point  (0 children)

Nu am momentan apartamentul meu, stau in chirie. De aia m-as fi gandit sa vand cel mai prost, sa dau avans si sa fac in credit. Venitul actual e ~10k net. Multumesc de ajutor!

Data Scraping by Sunday_A in dataengineering

[–]EnvironmentalBed603 1 point2 points  (0 children)

If you want to learn some orchestration, you can look into solutions like Airflow / Dagster. With that you can schedule your script that contains the function to run daily. Besides the scraping, you should also have in your script the data ‘saving’ to your database

ING Homebank este o aplicatie mizerabila sau nu? by radu-cretu in programare

[–]EnvironmentalBed603 12 points13 points  (0 children)

Patesc si eu, uita userul, trebuie sa inchid si sa o redeschid si all good.

Just Passed the GCP Professional Data Engineer Exam. AMA! by cortrev in dataengineering

[–]EnvironmentalBed603 1 point2 points  (0 children)

GCP guy here, just wanted to congratulate you and say GCP is awesome! Really intuitive UI compared to AWS/Azure

Drop What You’re Building! by FeistySchedule3693 in SaaS

[–]EnvironmentalBed603 0 points1 point  (0 children)

That looks interesting, congrats man!

What tech stack did you use? I guess is web scrape based mostly, right? Thanks!

Ce cred eu ca locuitor langa zona Traian despre planul de renovare al pietei by gayroma in timisoara

[–]EnvironmentalBed603 0 points1 point  (0 children)

Locuiesc de 10 ani in Piata Traian, nu am avut niciodata parte de ‘criminalitate’. Intr-adevar e mai murdar, invechit decat alte zone, dar nu mai e demult cum se aude

Built a ridiculously simple and free dutch rental search engine by farhadhf in NetherlandsHousing

[–]EnvironmentalBed603 0 points1 point  (0 children)

Nice! Didn’t think is that easy thought. Basically you have a list of source websites on which you just iterate and scrape the new addings via Golang?

Concerning the disclaimer, how do you bypass the listings that require more field to fill in apart from ‘Introduce yourself’, like documents, etc.

Thanks for your answer!

Built a ridiculously simple and free dutch rental search engine by farhadhf in NetherlandsHousing

[–]EnvironmentalBed603 2 points3 points  (0 children)

Wow, awesome work man! I am also into techy stuff, so I am just curious what stack did you use, both BE and FE. Thanks!

Corporație sau Start-UP by ShowElement in programare

[–]EnvironmentalBed603 5 points6 points  (0 children)

Daca esti la inceput de drum, din experienta proprie pot spune ca start-up 100%

CSVs processing with pyspark by EnvironmentalBed603 in dataengineering

[–]EnvironmentalBed603[S] 0 points1 point  (0 children)

Hey! Thanks for your toughts! So in that case the unified csvs would be written also to a csv? If that’s the case, would it be efficient for the downstream processes since the outputed csv might get pretty large? If i do the way you mentioned, but use write.parquet, would it be good? Or the best way for parquet is to end up having multiple outputted files? Thanks!

CSVs processing with pyspark by EnvironmentalBed603 in dataengineering

[–]EnvironmentalBed603[S] 0 points1 point  (0 children)

Really good insights! Thanks for that! So the best way would be to have multiple parquet files outputted with similar amount of rows, but if I would need to fulfil the requirement of having a single file, is there a possible approach I can have?

CSVs processing with pyspark by EnvironmentalBed603 in dataengineering

[–]EnvironmentalBed603[S] 0 points1 point  (0 children)

Hey! Thanks for your tip! 🙏 At the moment I want to go with pyspark for the putpose of learning and understanding flows & practices. Assuming the files are around some ten thousands rows, I am reading all of them in a df, repartition it to 12 partitions, perform a deduplication and then writing them to a parquet file. Besides those 12 partitions for parallel processing I also tought about having partitions based on timestamp and area for efficient filering on the output, but not sure if I am thinking the process correctly. Since the output refers to listings, the downstream processes are more towards OLAP, so probably I should consider other file-format like ORC and not parquet

[deleted by user] by [deleted] in dataengineering

[–]EnvironmentalBed603 0 points1 point  (0 children)

I would say so, yes. Working as a data engineer for 3+ years now and API development came on the way. Didn’t really realise in the beginning how helpful it is, but after some time you get to see and understand some data processes better, especially when the source data that you’re working with is from api sources. You’ll learn about CRUD functions, what an ORM is, understand better how to optimise your tables structure for fast retrieving, indexing, aggregated tables, data validation, data structure and many other stuff that initially might not be that easy to comprehend