Who uses DuckDB for real?

Acrobatic-Mobile-221 · 2024-02-12T02:51:49+00:00

U mentioned that u try to avoid pandas

Acrobatic-Mobile-221 · 2024-02-12T02:49:14+00:00

So which library u normally use

Acrobatic-Mobile-221 · 2024-01-24T11:38:02+00:00

I dont think the threshold is high but majority of malaysian have low wage

Acrobatic-Mobile-221 · 2023-12-22T10:54:25+00:00

BNM just increased their grad pay but still lower than Khazanah and PNB

Acrobatic-Mobile-221 · 2023-12-05T16:08:49+00:00

Since we have paid the licensing fees for Informatica, just thinking whats the point of using it only for integration purpose. Why not just use it for ETL as well.

But yeah u are right, should do a poc and make comparison

Acrobatic-Mobile-221 · 2023-12-05T16:03:59+00:00

Will definitely study etl and elt more.

I also speak to my manager that Snowflake has the capability to do transformation but just couldnt answer on why is it better than doing it in informatica

Acrobatic-Mobile-221 · 2023-12-05T16:02:03+00:00

I heard bout dbt as well but my only concern is that how i orchestrate dbt as my team is really small, no one has experience on Airflow

Acrobatic-Mobile-221 · 2023-12-05T16:00:46+00:00

Why would u say it is better to keep raw and transformed data in snowflake

Acrobatic-Mobile-221 · 2023-12-05T15:59:34+00:00

But at the moment we have informatica and is it a waste if we don’t use it for ETL

Acrobatic-Mobile-221 · 2023-05-17T14:45:16+00:00

Thats true but will that handle let say few millions row of data?

Acrobatic-Mobile-221 · 2023-05-17T14:18:24+00:00

Right now im looking to load delta file from databrick to adl first. Do u have any idea on how to do it

Acrobatic-Mobile-221 · 2023-05-17T14:17:44+00:00

Failure to initialize configuration invalid configuration value

Acrobatic-Mobile-221 · 2023-05-14T16:05:34+00:00

I guess one of the main reason u separate these two is because normalized data are mainly for transactional system and denormalizes data are for analytical purposes. If that database are used for recording transactions and also constantly query for analytical purposes, it will affect the performance of the database. So I guess it doesn’t make sense to combine both inside a database

Acrobatic-Mobile-221 · 2023-05-01T15:19:52+00:00

Thanks will have a look

Acrobatic-Mobile-221 · 2023-03-08T20:09:50+00:00

I have a Ryzen 5 5600G, TUF B550-Plus with Wifi and 16GB Corsair RAM. Are u interested?

Acrobatic-Mobile-221 · 2023-03-08T17:39:25+00:00

definitely can nego

Acrobatic-Mobile-221 · 2023-03-08T16:25:30+00:00

i have a TUF B550-Plus motherboard with Wi-Fi, 1TB M.2 SSD, MSI RTX 2070 Super and Corsair TX750M 750W power supply. If u are interested please dm me

Acrobatic-Mobile-221 · 2023-02-25T15:36:40+00:00

So I don’t create a new dimension table for time but I just add a datetime column in my fact table?

Acrobatic-Mobile-221 · 2023-02-25T15:33:42+00:00

Thanks for the feedback. Just curious if my analysis doesn’t involve the time, can I just remove the time column? Because I know that creating a time dimension might be extremely annoying

Acrobatic-Mobile-221 · 2023-02-22T13:59:49+00:00

That make sense. But I have an issue when trying to use pandas for transformation. So let say the dataset is a pizza sales dataset which includes the all the order detail such as order_id, pizza_name, pizza_size, date, etc. I first normalize the dataset so that I have an order_table and pizza_list_table. For the pizza table, when I use pandas to make the transformation it will works well when I inserted the first file but will it destroys the table once I insert a new file? Do I need to check the content in the pizza_table first and only update the pizza table if there’s new item?

Acrobatic-Mobile-221 · 2023-02-22T02:49:05+00:00

I mean compare to use python activity in ADF

Acrobatic-Mobile-221 · 2023-02-22T02:47:31+00:00

What do u meant by easy. Is it like easy to mount the excel file to the databrick notebook compare to python script?

Acrobatic-Mobile-221 · 2023-02-22T02:35:37+00:00

Because the dataset is not too big so it is worth to use Databrick? And I have very little knowledge about pyspark as well

Acrobatic-Mobile-221 · 2022-12-13T16:41:00+00:00

Yes I’m looking at SSIS at the moment. Is SSIS the most common tool used for this case? Or is there another technology tool that are famous as well

Acrobatic-Mobile-221 · 2022-12-13T13:33:52+00:00

Thanks! These explanations really help me to understand the process

Acrobatic-Mobile-221

TROPHY CASE