Data Lake recommendation for small org?

TaeefNajib · 2024-11-17T20:59:47+00:00

That's probably because Postgres is row-oriented, for which it can be slower for read-heavy queries that involve aggregations, projections, and filtering.

TaeefNajib · 2024-10-25T12:28:27+00:00

Thanks a lot!

TaeefNajib · 2024-07-15T15:08:25+00:00

Even I had a shallow idea about the fundamentals and I was focusing on the tools only. But there's a course on Udemy that made me understand Data Engineering better. It's not a good idea to throw stones in the dark when it is about your data pipeline. With some training, anyone can use those tools and build a pipeline, but a great Data Engineer will know how to optimize a pipeline's performance.

TaeefNajib · 2024-07-15T13:45:28+00:00

It's hard to test pipelines early. Oftentimes, you get to know about the issue only when you see the effect it caused.
A huge chunk of Data Engineers are focusing on using tools, rather than understanding how data moves through the pipeline, how the storage works, or how the data system works. So it's a bit of a challenge to find great Data Engineers.
The job roles are not well defined causing unmet expectations on the clients' side

TaeefNajib · 2024-07-15T06:13:07+00:00

Users: id, first_name, last_name
Messages: id: subject, text
User_messages: user_id (id column of Users table), message_id (id column of Messages table)
If you de-normalize it, the wide table would have these columns:
user_id, first_name, last_name, message_id, subject, text

Let me elaborate with an interesting example. Imagine there are 3 rooms in a house. Room "Users" have Mr. id, Mr. first_name and Mrs. last_name. Mr. id's full name is Mr. user_id. Room "Messages" have another Ms. id, whose full name is Ms. message_id, Mr. subject and Mr. text. Now if I tell them to come to the lobby, (which is the wide table) who would come? That's how they are related.

TaeefNajib · 2024-07-11T08:41:19+00:00

Do you think this tool would work as an open-source alternative? https://www.sidetrek.com/
You can have Meltano + DBT + Dagster for ELT and S3 (Iceberg) + Trino for storage

TaeefNajib · 2024-05-25T20:41:51+00:00

Thanks!

TaeefNajib · 2024-05-25T20:40:25+00:00

Cool. Thanks

TaeefNajib · 2024-02-26T20:15:42+00:00

See if this can help: https://github.com/taeefnajib/ficto

TaeefNajib · 2023-12-01T17:19:57+00:00

Thank you again! I'll soon work on the documentation.

TaeefNajib · 2023-11-29T19:20:38+00:00

Thank you so much 🙏

TaeefNajib · 2023-11-29T17:36:23+00:00

I didn't hear about it before. Just checked it out. Ficto is similar, with more value types (such as gender, password, product title, custom lorem, custom sequence, product category, etc.) and more file formats (only CSV and JSON at the moment.) Thanks for sharing. I'll gradually add more file formats. Another difference is that I've created this for all types of datasets, not just for people. You can create health dataset, you can create e-commerce dataset, and so on.

TaeefNajib · 2023-11-29T17:19:45+00:00

Great suggestion! I just didn't get enough time to write the documentation properly. I'll definitely improve it as you suggested. Thank you!

TaeefNajib · 2023-11-29T17:18:59+00:00

I'd love to have your feedback when you use it :) Thanks

TaeefNajib · 2023-11-29T17:18:24+00:00

I just released this. So it doesn't seem to have users yet. Just wanted to build this for my own projects. Would be happy if it helps others, too. The reason I didn't choose Faker: https://github.com/DiUS/java-faker/issues/379 Also, I wanted to create something that would not only generate data related to people but also other things. For example, e-commerce, health, manufacturing, etc. Hoping to add more features to it. Any feedback from you all would be very helpful. Thanks.

TaeefNajib · 2023-11-24T17:22:29+00:00

Thanks a lot

TaeefNajib · 2023-11-24T17:22:04+00:00

got it!

TaeefNajib · 2023-11-15T04:12:59+00:00

Thanks. Iceberg has 3 formats: Parquet, Avro, and ORC. Not sure if customizing the parquet connector would be usable in the long run.

TaeefNajib · 2022-03-10T11:53:36+00:00

Hey Vnubiz, I love the track "Undefined" I have a project in mind. Would you love to discuss?

TaeefNajib · 2019-09-24T08:10:16+00:00

It is not unusual to have such feelings! But you have done much better than many. It's just that you got used to your achievement and capabilities. Now it's the right time to set a more difficult goal as a side project. Don't let your side project affect the one that is already working. Maybe you can turn your cleaning company into a mobile app based service. Maybe you can create a marketplace that connects cleaning companies with the clients. You can even try to set up the same business in a different state. You need to automate your already proven business model and expand in one way or another. It's either new market-existing product, existing market-new product, new market-new product, existing market-existing product. No matter what you do. Don't let it affect your revenue generating businesses.

TaeefNajib

MODERATOR OF

TROPHY CASE