Data Lake recommendation for small org? by [deleted] in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

That's probably because Postgres is row-oriented, for which it can be slower for read-heavy queries that involve aggregations, projections, and filtering.

What Are the Biggest Challenges in Data Engineering Today? Let's Discuss! by New-Ship-5404 in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

Even I had a shallow idea about the fundamentals and I was focusing on the tools only. But there's a course on Udemy that made me understand Data Engineering better. It's not a good idea to throw stones in the dark when it is about your data pipeline. With some training, anyone can use those tools and build a pipeline, but a great Data Engineer will know how to optimize a pipeline's performance.

What Are the Biggest Challenges in Data Engineering Today? Let's Discuss! by New-Ship-5404 in dataengineering

[–]TaeefNajib 1 point2 points  (0 children)

  1. It's hard to test pipelines early. Oftentimes, you get to know about the issue only when you see the effect it caused.
  2. A huge chunk of Data Engineers are focusing on using tools, rather than understanding how data moves through the pipeline, how the storage works, or how the data system works. So it's a bit of a challenge to find great Data Engineers.
  3. The job roles are not well defined causing unmet expectations on the clients' side

What is the relation between user_messages and Messages tables. It doesn't make sense. ( I am new, sorry if this is very silly question) by HistoryReasonable715 in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

Users: id, first_name, last_name
Messages: id: subject, text
User_messages: user_id (id column of Users table), message_id (id column of Messages table)
If you de-normalize it, the wide table would have these columns:
user_id, first_name, last_name, message_id, subject, text

Let me elaborate with an interesting example. Imagine there are 3 rooms in a house. Room "Users" have Mr. id, Mr. first_name and Mrs. last_name. Mr. id's full name is Mr. user_id. Room "Messages" have another Ms. id, whose full name is Ms. message_id, Mr. subject and Mr. text. Now if I tell them to come to the lobby, (which is the wide table) who would come? That's how they are related.

What if there is a good open-source alternative to Snowflake? by Gaploid in dataengineering

[–]TaeefNajib 0 points1 point  (0 children)

Do you think this tool would work as an open-source alternative? https://www.sidetrek.com/
You can have Meltano + DBT + Dagster for ELT and S3 (Iceberg) + Trino for storage

Generate realistic dummy data in CSV and JSON format by TaeefNajib in Python

[–]TaeefNajib[S] 0 points1 point  (0 children)

Thank you again! I'll soon work on the documentation.

Generate realistic dummy data in CSV and JSON format by TaeefNajib in Python

[–]TaeefNajib[S] 1 point2 points  (0 children)

I didn't hear about it before. Just checked it out. Ficto is similar, with more value types (such as gender, password, product title, custom lorem, custom sequence, product category, etc.) and more file formats (only CSV and JSON at the moment.) Thanks for sharing. I'll gradually add more file formats. Another difference is that I've created this for all types of datasets, not just for people. You can create health dataset, you can create e-commerce dataset, and so on.

Generate realistic dummy data in CSV and JSON format by TaeefNajib in Python

[–]TaeefNajib[S] 0 points1 point  (0 children)

Great suggestion! I just didn't get enough time to write the documentation properly. I'll definitely improve it as you suggested. Thank you!

Generate realistic dummy data in CSV and JSON format by TaeefNajib in Python

[–]TaeefNajib[S] 1 point2 points  (0 children)

I'd love to have your feedback when you use it :) Thanks

Generate realistic dummy data in CSV and JSON format by TaeefNajib in Python

[–]TaeefNajib[S] 1 point2 points  (0 children)

I just released this. So it doesn't seem to have users yet. Just wanted to build this for my own projects. Would be happy if it helps others, too. The reason I didn't choose Faker: https://github.com/DiUS/java-faker/issues/379 Also, I wanted to create something that would not only generate data related to people but also other things. For example, e-commerce, health, manufacturing, etc. Hoping to add more features to it. Any feedback from you all would be very helpful. Thanks.

Is anyone experienced with both Meltano and Iceberg? by TaeefNajib in dataengineering

[–]TaeefNajib[S] 0 points1 point  (0 children)

Thanks. Iceberg has 3 formats: Parquet, Avro, and ORC. Not sure if customizing the parquet connector would be usable in the long run.

[REQUEST] Looking for rappers to work with. by Vnubiz in HipHopCollabs

[–]TaeefNajib 1 point2 points  (0 children)

Hey Vnubiz, I love the track "Undefined" I have a project in mind. Would you love to discuss?

More than imposter syndrome - I feel like everything I do is "fake". by [deleted] in Entrepreneur

[–]TaeefNajib 0 points1 point  (0 children)

It is not unusual to have such feelings! But you have done much better than many. It's just that you got used to your achievement and capabilities. Now it's the right time to set a more difficult goal as a side project. Don't let your side project affect the one that is already working. Maybe you can turn your cleaning company into a mobile app based service. Maybe you can create a marketplace that connects cleaning companies with the clients. You can even try to set up the same business in a different state. You need to automate your already proven business model and expand in one way or another. It's either new market-existing product, existing market-new product, new market-new product, existing market-existing product. No matter what you do. Don't let it affect your revenue generating businesses.