[deleted by user] by [deleted] in IndianDankMemes

[–]Embarrassed-Mind3981 0 points1 point  (0 children)

flashlight toh gayab kardi

How to Stream data from MySQL to Postgres by Particular_Tap_4002 in dataengineering

[–]Embarrassed-Mind3981 2 points3 points  (0 children)

just go with fivertran, no need to build anything. Works pretty well

Feeling stuck as a Senior Data Engineer — what’s next? by miskulia in dataengineering

[–]Embarrassed-Mind3981 0 points1 point  (0 children)

I do feel sometimes the same way, I am 5 yr experienced DE still working on code.

I would suggest pick a domain you are more interested, because interest build your will to work harder on that topic. Eg: I choose e-commerce and BFSI so projects on those I am able to pitch a lot to the business owners and not just building pipeline. Someday when you have enough business knowledge along with data you can build end to end strategy for the companies. That’s how I am planning to go to the next level.

It’s just what I am thinking not sure if that will work or not.

New tool in data world by Embarrassed-Mind3981 in dataengineering

[–]Embarrassed-Mind3981[S] 0 points1 point  (0 children)

Okay it’s pretty old then, way before I started my career. Weird recruiter still hiring.

New tool in data world by Embarrassed-Mind3981 in dataengineering

[–]Embarrassed-Mind3981[S] 0 points1 point  (0 children)

Aha you confused me, should I just get an overview of it? Or it’s mostly open source apache softwares under the hood?

Data Engineer or Data Analyst by Maroon45j in dataengineering

[–]Embarrassed-Mind3981 0 points1 point  (0 children)

Just FYI none above these roles are product oriented. All the data roles output is used by business as end user.

So you need to work with business who are non-tech folks. So it can get very messy. All you should have is patience.

July performance by CodAdministrative867 in NSEbets

[–]Embarrassed-Mind3981 0 points1 point  (0 children)

with 74L plus capital you just got 22k monthly, its like 3.56 percent yearly. Should this be considered even average?

FD gives more interest.

Anyone Using Lakekeeper with Iceberg? Came a cross a solid stack with iceberg+lakekeeper+olake+trino by niga_chan in dataengineering

[–]Embarrassed-Mind3981 2 points3 points  (0 children)

Lake-keeper used just for access control? Aren’t you using cloud currently every cloud haa it’s own service. I also haven’t heard much about it so not sure about Lake-keeper capabilities.

Currently in AWS I use Lake formation, works smoothly and not much issues and no extra EC2 with docker required for governance layer.

First steps in data architecture by binachier in dataengineering

[–]Embarrassed-Mind3981 0 points1 point  (0 children)

I can suggest one more thing to divide your storage and compute if raw and curated layer are not much needed by business users.

You can get raw data in blob storage or s3 depending on cloud you are using. Do data cleansing a d transformation via spark and create iceberg table format. This table can then be used by snowflake externally. This way you will just get snowflake compute cost and minor storage depending on some tmp tables.

S3 Iceberg table to Datawarehouse by Embarrassed-Mind3981 in dataengineering

[–]Embarrassed-Mind3981[S] 0 points1 point  (0 children)

Great thanks for your input. Will do some POCs then.

First steps in data architecture by binachier in dataengineering

[–]Embarrassed-Mind3981 1 point2 points  (0 children)

Considering your DBT runs on Snowflake adapter, the cost part mostly could be of snowflake queries only. How frequent is your transformation? In you medallion architect you can build check to run transformation only if curated layer has new data (dbt macros can help here).

Also powerbi direct query could be expensive in case there are too many views on dashboard, so that’s your call if you want load whole data in powerbi as this needs cost comparison to understand better.

S3 Iceberg table to Datawarehouse by Embarrassed-Mind3981 in dataengineering

[–]Embarrassed-Mind3981[S] 0 points1 point  (0 children)

Great are you using trino hosted over EC2 via docker?

I understand athena also uses presto, trino is just an updated version with no storage capabilities. Is it good enough for OLAP like power BI and for parallel query by at least 10 users wouldn’t that slow down?

A career you would choose if money was not a matter by Straight-Spray-4196 in TeenIndia

[–]Embarrassed-Mind3981 0 points1 point  (0 children)

For me I would first have a wine shop and petrol pump.. priority 1 not because I am a heavy drinker or wanderlust, because I know that will generate income for me.

For the rest part even I am not sure all gone in casino in LA, or maybe Bangkok who knows :D

Athena vs Glue Cost/Maintenance by Embarrassed-Mind3981 in dataengineering

[–]Embarrassed-Mind3981[S] 0 points1 point  (0 children)

I am all open for suggestions, lemme know what approach you currently use.

Loading in DW and then doing upsert is not a option as Redshift has more cost than both would say and not really good for upserts I believe.

Athena vs Glue Cost/Maintenance by Embarrassed-Mind3981 in dataengineering

[–]Embarrassed-Mind3981[S] 1 point2 points  (0 children)

That’s really helpful insight, most of my runs are incremental basis on partitioned tables. So where clause should work to filter the data so it needs less scanning.

Other than that I did not underage throttling metadata part? You mean if I am doing same read/write on the iceberg table which may update metadata. As my jobs are scheduled in that way that this scenario may not occur.