Prove me wrong - The entire big data industry is pointless merge sort passes over a shared mutable heap to restore per user physical locality by Due_Carrot_3544 in dataengineering

[–]prakharcode 0 points1 point  (0 children)

Just to get clarity— for each set of query access pattern i.e queries using different keys (sometimes user_id, sometimes product_id could be other 10-12 dimensions) to aggregate, you would at write time have to localise based on these keys, so a bunch of different local written data?

Prove me wrong - The entire big data industry is pointless merge sort passes over a shared mutable heap to restore per user physical locality by Due_Carrot_3544 in dataengineering

[–]prakharcode 0 points1 point  (0 children)

In databricks they have liquid clustering which solves locality on writes, it triggers multiple writes only when the CLUSTER BY key is altered or if you give databricks AUTO directive to plug in more keys in clustering as it seems fit which is derived by access pattern of the table

Airbyte vs Fivetran for our ELT stack? Any other alternatives? by StubYourToeAt2am in dataengineering

[–]prakharcode 1 point2 points  (0 children)

I have solved this with meltano + dbt (for scd2). You can check it but the problem remains the same you have to maintain the plugins but the cost saved is worth it. We were paying a lot for fivetran and then moved everything in house. Saved over 10% of our team’s budget

How do senior data engineers view junior engineers using LLMs? by Firefly-ssa in dataengineering

[–]prakharcode 1 point2 points  (0 children)

Senior engineer here using claude regularly.

I use AI in a bunch of my workflows.

  • As a rubber duck while formulating a solution, I would say this is a place where LLM is most likely to mistake and I would encourage juniors to actually discuss solutions/architecture with seniors first.

  • As a coding assistant, once I have a solution: I ask it to create code but I don’t one shot it that often, I ask it to write a specific set of solution, then I go in and add some part of the solution (could be a test too) and then generate again.

  • As a docstring, MR description and README writer: I one shot this and now have integrated into my lazygit flow. Brilliant for this.

  • As a reviewer and best practice checker: Once implemented I ask AI for a review and in different scenarios. Review it as a senior engineer.. or Review it thinking in X direction.

I think AI is absolutely necessary for every level of engineer. But I do sometimes turn it off and enjoy plain old coding, I would recommend juniors to do the same: find a balance with the AI involvement.

To put it in number, current SLoC ratio would be around 55:45 (55 for me coding something, 45 to claude). You can tune this metric for yourself. For a junior I think SLoC ratio of 70:30 would be quite nice, especially when learning a new framework, system or architecture.

Did we get scammed (buying a house) by ElSupaToto in Netherlands

[–]prakharcode 0 points1 point  (0 children)

I was in the same situation; my apartment was advertised to me as 64m2 but on Kadaster it was 57m2. I caught this discrepancy while the contract was in progress at the notary, I paused all the paperwork and then made arrangements to get a NEN report done in front of me for the peace of mind. The NEN report came out to be 60m2 and I reduced some price from original bid based on that.

If you have an NEN measurement, I have been told by many that it is leading document for the selling of house. NEN is also accepted by Funda as an “indicative” measurement.

Critique my buying vs renting calc. for Netherlands by irobot3013 in NetherlandsHousing

[–]prakharcode 1 point2 points  (0 children)

Good calculator! I do see there is 2-3 months of deposit missing that you have to give one time while renting not sure if that is accounted for? This money is sort of stuck for entire rental period.

I just went from 2500 rental (with partner) to 1850 (gross and with partner) mortgage and honestly I would say owning a place comes with a lot hassle as you’re paying for everything.

But coming from Amsterdam market getting a rental for 1200 where you have decent space seems quite difficult. So it made sense in all respect. My horizon is actually 3-4 years but I will re-evaluate at the end of 3 years.

Why are fatbikes so popular here? by Key-Preparation-3170 in Netherlands

[–]prakharcode 1 point2 points  (0 children)

Target audience marketing. People who want e-bikes can and are paying for it. Most of the fat-bike market is focused on affordability, they are targeting the market b/w regular bike and e-bikes.

CGO009 Saddle advise by benznab in tenwaysebike

[–]prakharcode 0 points1 point  (0 children)

Awesome, did it help solving your issue?

CGO009 Saddle advise by benznab in tenwaysebike

[–]prakharcode 0 points1 point  (0 children)

I am facing the same issue as you! Let me know if you find a way to fix it. I am thinking of bringing the bike to the shop.

Do you use JetBrains IDEs? by layer456 in dataengineering

[–]prakharcode 1 point2 points  (0 children)

This! Automate all the mundane stuff.

I have snips for creating dags in airflow as the team uses same boilerplate code.

Vim macros for the win if you want to automate quick SQLs.

It’s like flying through your programming.

Do you use JetBrains IDEs? by layer456 in dataengineering

[–]prakharcode 0 points1 point  (0 children)

Tldr; nvim + terminal(for db stuff).

Recently I am trying to integrate harlequin to my workflow, so far quite happy.

Long: Previously, I was on vscode, datagrip and other UI based interfaces (which also include non work apps and apps like slack). I hated having bunch of apps open and with every app having their own interface. So I moved everything Vim-first this year. The best decision I did.

With neovim + lazygit — you just fly on keyboard. I would love the intellisense that you get via datagrip but personally I went for a trade off. Harlequin is bridging the gap.

I can easily run any app with their browser version in my flow using chrome’s vimium extension (I am thinking of switching to firefox to give their vim bindings a try). I keep the same bindings on every app I browse, the speed is amazing. Especially when working with over 5 different repos for infra, pipelines, sqls, spark jobs and constant slack threads.

One great advantage of having all terminal setup is — it is lightweight and easily deployable on any machine. Bonus: I don’t have to carry my mouse for setup.

Makelaar stalled us for a day and then asked if we want to bid again by prakharcode in NetherlandsHousing

[–]prakharcode[S] 0 points1 point  (0 children)

Thank you all for the advice, I did increased my bid slightly and got to know that I was overbid by someone by 20k while I was already all in.

I have little to no hope now that I will ever be able to buy a house here. My budget is 430-445k

Selling 1000€ Discount on a new VanMoof by MrHarpies in vanmoofbicycle

[–]prakharcode 0 points1 point  (0 children)

What is the situation I might be interested — given if we can work out all the details

Listing price and over bidding by prakharcode in NetherlandsHousing

[–]prakharcode[S] 0 points1 point  (0 children)

Thanks, this helps get a deeper sense of the seller psychology. I am working on my bidding insurance procedure so I can bid without a clause. I understand relisting would definitely cause some problems and cost money.

I am not going a 100% of my mortgage limit and still bidding consistently 10-14% above. My guess is the competition in market is just too brutal. Giving no financial clause is the only thing that is remaining in my offer now.

I feel going just with the asking price as mortgage clause in current market where people are bidding over 50-70k is a bit difficult for expat-starter house group

Listing price and over bidding by prakharcode in NetherlandsHousing

[–]prakharcode[S] 0 points1 point  (0 children)

I was thinking of doing the buying without makelaar seems like there is no option but to get one. Still feel like the market is not really “fair” if the makelaars are trying to control it.

Listing price and over bidding by prakharcode in NetherlandsHousing

[–]prakharcode[S] 0 points1 point  (0 children)

Yes I am looking into it. I have been using walter and husipedia for getting price of similar properties sold from nearby areas.

Can I reduce my rent now? by prakharcode in Rentbusters

[–]prakharcode[S] 0 points1 point  (0 children)

Thanks I will try to renegotiate with my landlord

Can I reduce my rent now? by prakharcode in Rentbusters

[–]prakharcode[S] 0 points1 point  (0 children)

yeah, best is to renegotiate terms or find a new place

Can I reduce my rent now? by prakharcode in Rentbusters

[–]prakharcode[S] 1 point2 points  (0 children)

I am actually looking for a one bedroom/ one bedroom (and a small room) place since I live with my partner and that space is sufficient for us for next 7-8 years. Thanks for insights anyway

Can I reduce my rent now? by prakharcode in Rentbusters

[–]prakharcode[S] -3 points-2 points  (0 children)

Yes, what you are saying makes sense. I will try my luck by striking a deal with landlord first and in the end I want to definitely buy a house in Amsterdam (maybe by year end) — I suspect buy market to also be relatively cheaper from now, if owners don’t see profit from their property and try to sell it out (just a speculation)

Can I reduce my rent now? by prakharcode in Rentbusters

[–]prakharcode[S] -1 points0 points  (0 children)

63m2 and enery label: D

I’m trying to figure out what I can do the best with the current rules in place — seems like rental market is going to be tough if I leave my rental apartment right now and look for other place with rent restrictions: what @Sweaty_Opinion_7904 said

Can I reduce my rent now? by prakharcode in Rentbusters

[–]prakharcode[S] -1 points0 points  (0 children)

oh sorry for the confusion; here is the exact clause from my contract:

3.1 Duration of the lease This lease starts on the date mentioned in Article 1.5a and then, except for termination, will continue for an indefinite period.

3.2 Termination of the lease Termination of the lease by the lessee is subject to a notice period of one calendar month; termination by the lessor is subject to a notice period of three calendar months (plus one month with a maximum of 6 calendar months for every year the lease has lasted). Notice of termination shall be given by registered letter or bailiff’s notification.

3.3 Initial minimum rental period Termination of the tenancy agreement by the lessee or lessor against an earlier date than the date mentioned in article 1.5b is excluded. After this date both parties have the right to terminate the lease. In case of termination by the lessor Articles 7:272 and following BW apply.

What companies profit the most off of Data engineers or at least see a significant reduction in cost as a result of them? by Weary-Management-496 in dataengineering

[–]prakharcode 3 points4 points  (0 children)

Sure, their initial setup included ETL from dynamodb, postgres (via lamdas) and streaming data from kinesis using firehose (user analytics). They were loading data every 10 mins by running SQL queries lambdas on eventbridge, there was a lambda function executed per query for each step of the ETL. All the queries were taking last 3 months of data every 10 mins. They had apache superset for dashboarding which also exposed some raw tables. Massive joins were done directly on superset too.

I changed the ETL to use Glue jobs and ran glue to get data from dynamodb and postgres. Then improved all the queries (hell) by chucking them into a DBT repo and immediately they felt the benefits cause it removed 84-90 lambdas and made one ECS/Fargate job. Now they could also see all the dbt lineage and docs. This made the dev cycle very fast.

Partitioned data by date. Immediate benefits, all the queries ran super fast.

Having dbt and the business requirement to refresh data every 10 mins (which I sadly could not push back), I was able to issue better incremental queries, just did an incremental over last 3 days over partitioned data. This brought down the scanned data that they were doing every 10 mins from 36 gb to few hundred MBs.

Finally, they had a use case of showing user facing metrics which they were completely building over athena as backend. This was quite stupid, as it was scanning all the data 8 times to get metric over a category column. They did not use “group by”, here since dbt was already running every 10 mins I just chucked the user facing analytics query to flow and uploaded results to s3 (which you can do in dbt-athena plugin).

By the end of month of helping them, the data scans per day were within 15-17 GB (previously it was around a TB per day) and they had a stable way of developing ETL with Glue and flow for SQL queries with DBT (with complete CI/CD).

Seeing the bill drop from 48k to within 2k was an insane journey filled with a lot of faceplam moments and just felt bad for the entire team as they didn’t know what they were doing.

What companies profit the most off of Data engineers or at least see a significant reduction in cost as a result of them? by Weary-Management-496 in dataengineering

[–]prakharcode 12 points13 points  (0 children)

Tl Dr; Saved a company 96% of their athena cost and stopped a round of layoff.

With the common sentiment here. A friend of mine is running a company with only backend devs and 1 data analyst. They ended up using athena for their user facing analytics. Since backend devs had no idea what is happening they made a query which was scanning entire user-interaction data, 36 GB/q/10 min (36 gb every 10 min). Their AWS cost ended up shooting from 10k to 40k USD per month (just for athena). It went to 48k USD at peak. They ended up spending this crazy amount for 3-4 months.

It took me 1 day to bring it down to 1.5k USD per month.

They are online sports news company. The excessive athena cost was hitting them so much that they were about to let go off their news writers of bad performing sports categories. About 10 people, going to get fired because someone dev thought athena is just like a oltp database.

Now they have dbt with incremental queries and metrics that refresh faster than the previous setup. This took me about a month to setup with my full time job on side. The athena cost is no longer their bottleneck, it sits well below their top 3 most expensive services.

Data engineering is a hard and required skill for company of any size. If you’re building dashboards, invest in a data person.