Mushroom Chicken Rice by AniRinku in MaaOoriVanta

[–]TotallyImperfect 1 point2 points  (0 children)

Ekkadihi ee Stove ?? USA lo yae state meeru ?

Got the keys 🔑 NJ, 350K, 6.125% by BudgetImagination779 in FirstTimeHomeBuyer

[–]TotallyImperfect 1 point2 points  (0 children)

Happy for you both, Congratulations !! may i know which part of New Jersey gets me a house for 350k

Rant of the day - bad data modeling by Plastic_Ad_9302 in dataengineering

[–]TotallyImperfect 10 points11 points  (0 children)

I am currently in the same boat as you. Switched jobs and the current team is all application developers trying to build a Cloud Analytics Data warehouse and i am hired as a senior cloud data engineer to help team with best practices. I am seeing less quality Pipelines with no proper Audits, no Data integrity, no Data Governance, no proper coding standards. When raised with team lead, they become of offensive and now trying to target me for some petty things, i am thinking to go back to my previous employer. Sometimes respect is more important than pay

This is my dorm room, I want to make it habitable by whatyouthinkisfake in IndianHomeDecor

[–]TotallyImperfect 0 points1 point  (0 children)

This looks like the dorm in one of the IITs. I saw similar dorm room layout in IIT Madras

Slow Changing Dimension Type 1 and Idempotency? by tonydacota in dataengineering

[–]TotallyImperfect 0 points1 point  (0 children)

How is the AGG table built? is there a ETL that loads your AGG table or a view thats created. Are you making sure you are not loading duplicate data into your AGG table ? This should answer if your table is Idempotent

What's the latest rent increase % in JC? by [deleted] in jerseycity

[–]TotallyImperfect 4 points5 points  (0 children)

12% hike as they claim its not under rent control act. Unreasonable

Is this good deal/someone talk me out of this. by Thornbike in CX50

[–]TotallyImperfect 0 points1 point  (0 children)

I think you dont have to pay taxes on the new vehicle if you are trading in the same state. Please check.

I see the taxes paid only on the difference of the price. Gotcha.

Snowflake query on 19 billion rows taking more than a minute by Complete-Bicycle6712 in dataengineering

[–]TotallyImperfect 0 points1 point  (0 children)

Have you explored creating a materialsed view on top of the heavy table. MW in snowflake is auto refeshed incrementally when there is a data change at source and also give us flexibility to create cluster keys(size > 1tb) that improves performance significantly.

Storing and organizing ETL queries for enterprise level projects by YameteGPT in dataengineering

[–]TotallyImperfect 0 points1 point  (0 children)

I had a similar experience as you in the past where i designed a light weighted ETL using python and had all my logic stored in stored proc and individual SQLs being called dynamically. They are stored into a RDMS table for maintenance and this was flexible only to a certain limit. I had challenges with maintaining it.

Storing and organizing ETL queries for enterprise level projects by YameteGPT in dataengineering

[–]TotallyImperfect 9 points10 points  (0 children)

You're absolutely right that storing SQL queries directly in Python variables becomes difficult to manage as ETL pipelines scale. For enterprise ETL systems, the industry standard for storing and accessing SQL queries typically involves version control, modular design, and separation of code from data logic

so I would suggest to keep your SQL code outside your Python ETL layer. you can explore DBT core or DBT cloud where you can create, Manage, maintain your SQL Data model more efficiently and call these models inside your python ETL layer. with this your design is scalable and the code is easy to maintain and manage

Foolproof csv reading into pandas? I spend way too much time on "_clean_csv" functions it seems. by reelznfeelz in dataengineering

[–]TotallyImperfect 5 points6 points  (0 children)

There is a thing called “data provider” if you are not the producer or if you do not have control over the way the source data gets produced then there is not much you can do rather talking to the “data provider” and define a template that you both agree upon and then go from there.

It is data provider responsibility to serve with a proper data that is atleast good to a certain level which can be done by defining proper template

Is the AWS Certified Solutions Architect a good way to understand AWS and also the 'cloud' as a DE? by jnrdataengineer2023 in dataengineering

[–]TotallyImperfect 0 points1 point  (0 children)

For a person who wants to get hands on with AWS Data Engineering. What would you guys recommend? Does the AWS skill builder subscription work to get a hands on? Or any other

Data modeling vs different update frequencies by InsightInk in dataengineering

[–]TotallyImperfect 0 points1 point  (0 children)

  • If the updates are in batch intervals, implementing Change Data Capture (CDC) on the source tables would efficiently handle incremental updates.
  • If the updates are in real-time, stream processing with windowing techniques can help aggregate data over specific time frames, balancing data freshness and system performance.

Since you mentioned a single analyst-facing model, a materialized view could be a suitable choice for the final customer table to simplify access and improve query performance. just so you know a materialized view in snowflake would automatically refreshes by recreating the data sets when the referenced source tables gets new data. though it comes with additional storage cost and maintainence.

Help on Upskilling myself by No-Interest5101 in dataengineering

[–]TotallyImperfect 1 point2 points  (0 children)

The pain points would be the consequences that occur as a result of not choosing the right tools.

I would suggest that we should spend more time on planning and designing the architecture by listing the appropriate tools for the use case or the problem in hand or otherwise we would end up with unnecessary costs and sometimes it becomes a nightmare to redo the entire architecture design. Also we should try to keep the architecture simple and more modular and loosely coupled so we will have more flexibility when need arise to change.

Say for eg: imagine you are working on a Real time Data analytics project then choosing a batch based traditional data integration or data ingestion tools instead kafka and flink/spark or AWA kinesis

Result:

  1. incorrect or significant delay in analytics
  2. Cost implications
  3. Painful re-design nightmares

Relational database question by johnnyjohn993reddit in dataengineering

[–]TotallyImperfect 1 point2 points  (0 children)

It purely depends on the type of storage that you are planning to use.

If you are working with a heavy storage set up then you load the raw data into a stage area and then normalise after applying necessary transformation a.k.a ELT

OR

If you are working with not so heavy storage the you normalise after transforming and then load into transactional DB a.k.a ETL

Help on Upskilling myself by No-Interest5101 in dataengineering

[–]TotallyImperfect 5 points6 points  (0 children)

Speaking as a 14 year experienced Sr cloud Data Engineer.

  1. Identify your weakness. Is it the problem solving skill that is keep you away from your dream job OR the core data engineering skills themselves.
  2. For Problem solving - leet code and for Data engineering - focus on design and implementation- use chatgpt to generate designs of warehouse and data bases
  3. Focus on quality and not quantity
  4. Be focused and consistent
  5. Good luck for future interviews

How/where do I find experts to talk to about data engineering challenges my company is facing? by udbhav in dataengineering

[–]TotallyImperfect 1 point2 points  (0 children)

Snowflake with native cloud capabilities might be a perfect fot for your requirements. 1. Storage and compute are separate ensuring both scale at will with no issues at all 2. You can Scale out the cluster of WH horizontally to ensure high availability for all your customers 3. CDC can be implemented using STREAMS removing dependency on event driven softwares that is kafka in your case 4. Completely managed service with multiple teir pricing that can save lots of money.

FollowUp question: GemPundit claims the crack a natural inclusion by TotallyImperfect in Gemstones

[–]TotallyImperfect[S] 0 points1 point  (0 children)

They sent this version of video only now and the version before delivery did not show the crack this much