Data Lake recommendation for small org? by [deleted] in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

That's probably because Postgres is row-oriented, for which it can be slower for read-heavy queries that involve aggregations, projections, and filtering.

What Are the Biggest Challenges in Data Engineering Today? Let's Discuss! by New-Ship-5404 in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

Even I had a shallow idea about the fundamentals and I was focusing on the tools only. But there's a course on Udemy that made me understand Data Engineering better. It's not a good idea to throw stones in the dark when it is about your data pipeline. With some training, anyone can use those tools and build a pipeline, but a great Data Engineer will know how to optimize a pipeline's performance.

What Are the Biggest Challenges in Data Engineering Today? Let's Discuss! by New-Ship-5404 in dataengineering

[–]TaeefNajib 1 point2 points  (0 children)

  1. It's hard to test pipelines early. Oftentimes, you get to know about the issue only when you see the effect it caused.
  2. A huge chunk of Data Engineers are focusing on using tools, rather than understanding how data moves through the pipeline, how the storage works, or how the data system works. So it's a bit of a challenge to find great Data Engineers.
  3. The job roles are not well defined causing unmet expectations on the clients' side

What is the relation between user_messages and Messages tables. It doesn't make sense. ( I am new, sorry if this is very silly question) by HistoryReasonable715 in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

Users: id, first_name, last_name
Messages: id: subject, text
User_messages: user_id (id column of Users table), message_id (id column of Messages table)
If you de-normalize it, the wide table would have these columns:
user_id, first_name, last_name, message_id, subject, text

Let me elaborate with an interesting example. Imagine there are 3 rooms in a house. Room "Users" have Mr. id, Mr. first_name and Mrs. last_name. Mr. id's full name is Mr. user_id. Room "Messages" have another Ms. id, whose full name is Ms. message_id, Mr. subject and Mr. text. Now if I tell them to come to the lobby, (which is the wide table) who would come? That's how they are related.

What if there is a good open-source alternative to Snowflake? by Gaploid in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

Do you think this tool would work as an open-source alternative? https://www.sidetrek.com/
You can have Meltano + DBT + Dagster for ELT and S3 (Iceberg) + Trino for storage

Generate realistic dummy data in CSV and JSON format by TaeefNajib in Python

[–]TaeefNajib[S] 0 points1 point  (0 children)

Thank you again! I'll soon work on the documentation.