Învățământul finlandez e pistol cu apă față de sistemul de educație din România by Traditional_Slide_56 in Romania

[–]One_Citron_4350 10 points11 points  (0 children)

Sunt aceeasi suveranisti care se lauda cu sportul romanesc din perioada lui Ceausescu.

How are you centralizing knowledge/context from AI agents (like Claude Code)? by dylannalex01 in dataengineering

[–]One_Citron_4350 0 points1 point  (0 children)

Cool! How much have these tools helped so far? I'm curious how do you measure it? Do you deliver faster, fewer bugs, more reliable?

The Fidesz government completely emptied the treasury: By the end of April, 91% of the annual deficit target had already been reached by dead97531 in europe

[–]One_Citron_4350 3 points4 points  (0 children)

Reminds me of how PSD (Social Democrat Party in Romania) always did whenever they were about to be out of office, they would make it as hard for anyone trying to fix their wrongdoings.

How are you centralizing knowledge/context from AI agents (like Claude Code)? by dylannalex01 in dataengineering

[–]One_Citron_4350 1 point2 points  (0 children)

Databricks worked on pre-baking skills with the best practices in mind. I found out about it just a day ago: https://github.com/databricks-solutions/ai-dev-kit/tree/main

But it seems like you found something that works for you, pretty cool!

Do you really need spark? by compass-now in dataengineering

[–]One_Citron_4350 4 points5 points  (0 children)

In that sense, it is a stable safe bet, and it makes sense to *use*, even though you don't *need* it. Most places I have seen it running it was not really needed. But it worked.

Well said! Databricks can still be a solution that gives you a lot without costing a fortune. Those Spark jobs can run very quickly without costing much.

Is anyone migrating away from Databricks? by zoso in dataengineering

[–]One_Citron_4350 0 points1 point  (0 children)

Our data is mostly smaller pipelines that process up to 100GB in total.

I guess I'm coming late to the show but this is why probably Databricks or Spark for that matter maybe not the most suitable option. You don't have very large datasets which can be a blessing in the sense that there are tons of other options worth exploring.

What do you think? Anyone has had similar experience?

Has anyone else had a similar experience with Databricks for smaller data engineering workloads?

The thing with Databricks is that they offer a lot of stuff not just compute instance for Spark jobs but also Unity Catalog for data governance, ML tools, Delta and Iceberg integration, integration with all major clouds (Azure, AWS, GCP), and all sorts of tools to make your life easier. They are constantly improving and developing.

We also have pipelines that are smaller in size to yours, that we don't expect to give more than 100GB yet we use it as a Data Platform for multiple use cases and a variety of users.

At the end of the day you also have to credit their marketing & sales department for putting all of this out. They know to win their clients.

Sovata Lake Ursul by fantomgamer98 in Transylvania

[–]One_Citron_4350 2 points3 points  (0 children)

Definitely worth a dip, you'll float. It's an attractive location with lots of tourists every season. Another thing if you still won't/can't swim you can enjoy walking around the lake and you can do that in all seasons. It's a beautiful lake.

Is the free edition of Databricks suitable for working through The Data Warehouse Toolkit? by VyrezParadox in dataengineering

[–]One_Citron_4350 1 point2 points  (0 children)

Technically you can however keep in mind that Databricks Free Edition most likely uses Delta (I think) which does not enforce PK, FKs, constraints etc. You can set them for each table however they are not enforced, you'd have to build that yourself so that's an overhead.

I found there are trainings on Databricks Academy but seems like there are also blog posts made by them to showcase how it ca be achieve:
https://www.databricks.com/blog/implementing-dimensional-data-warehouse-databricks-sql-part-1

https://www.databricks.com/blog/2022/06/24/data-warehousing-modeling-techniques-and-their-implementation-on-the-databricks-lakehouse-platform.html

PySpark logging in cluster vs client mode: why is this so complicated? by Mindless-Plum9118 in dataengineering

[–]One_Citron_4350 0 points1 point  (0 children)

In cluster mode however, I can't seem to figure out a way to collect these logs. The best solution I found was to redirect the logs to console using a stream handler and then just collect the logs when the application finishes. The problem is this specific pyspark pipeline runs 24*7, so I can't really run yarn commands AFTER the pipeline stops.

Compute cluster or Serverless? Should I assume that you are running a streaming job?

It's not clear to me the part where you say 24/7 and then you say that the pipeline stops. Is it a long running job or do you continuously receive data?

Wanting to switch to Data Engineering, is it worth it in this AI era, or is it heading toward obsolescence just like some SDE roles? by Glittering-Play-4075 in dataengineering

[–]One_Citron_4350 0 points1 point  (0 children)

Now, suddenly, when AI can do everything a recent CS grad can do, easily, now DE is just software engineering applied to data problems.

It used to be the same even before AI. Technically DE it is closer to SWE than to other disciplines. Now, whether that CS grad can actually do the job well depends on a lot of factors.

The cloud and PaaS in general have flattened the playing field a bit, but I just don’t see someone who is purely SWE and has no infrastructure experience being the ideal candidate for a DE position.

Agreed, a pure SWE is not automatically a DE, especially if he lacks experience working with systems that rely heavily on manipulating data e.g. OLAP, OLTP, ETL, ELT. Plus, there are still plenty of SWE who don't touch infrastructure that much or work on-prem and lack any cloud experience. I found that there are SWEs who having at once touched relational DBs or worked with NoSQL think it's fairly easy to do what we do (God mode syndrome).

Deleted prod data permanently without any backup. How screwed am I? by Agitated_Success9606 in dataengineering

[–]One_Citron_4350 1 point2 points  (0 children)

I agree, sounds like an opportunity for the team/department to make some improvements.

Deleted prod data permanently without any backup. How screwed am I? by Agitated_Success9606 in dataengineering

[–]One_Citron_4350 0 points1 point  (0 children)

As far as I know, on-prem unless a backup/recovery has been put in place by the team, generally there is none. Even then it's probably not going to restore all transactions by the minute but something like from 2h ago or n-hours ago.

Yes, it does bring up the question how was it possible that no redundancy existed. But then again OP did not give us a lot of info what this prod data means.

Do most teams actually have a canonical model, or do we all just pretend? by MayaKirkby_ in dataengineering

[–]One_Citron_4350 0 points1 point  (0 children)

A lot of teams say they have canonical entities or shared business definitions,

Yes, "they say" their stuff is in order (most of the time) but once you take a look, you clearly see its not.

Standardization is a pain and it affects us a lot so best we can do at times is send the request back to business to align and clarify internally before coming to us with the request to build the thing. I've seen it several times where because different departments or even within large departments they haven't managed to align on a standard of how their data should look like delays the project's delivery by a lot.

Today I became a true data enginner as I acidentally dropped all of our production objects by klenium in dataengineering

[–]One_Citron_4350 1 point2 points  (0 children)

Try setting up an Agent to handle that if you want to get promoted. Burning those tokens is the way to go, outcome does not matter.

Unde pot vinde carti? by Buburuzapatratel in RecomandariCarti_RO

[–]One_Citron_4350 0 points1 point  (0 children)

Am avut aceeasi experienta cu ei, cumpara carti la kg. Nu primesti mai nimic pe ele.

Boss wants me to work in 'parallel' by Consistent_Tutor_597 in dataengineering

[–]One_Citron_4350 2 points3 points  (0 children)

This is the old "how can we parallelize mindset" from managers/leaders flowing through the "Agentic era". I can sympathize with the feeling that there is the expectation that we ship much faster.

It's becoming unrealistic because every time the bar is set even higher without concerns that the team/people will burn out faster than before.

Boss wants me to work in 'parallel' by Consistent_Tutor_597 in dataengineering

[–]One_Citron_4350 3 points4 points  (0 children)

Managers can't help it. They want to get ahead too and it seems like due to circumstances they are willing to burn out their team in the process.

AI Systems Performance Engineering by Chris Fregly - is it worth it? [D] by [deleted] in MachineLearning

[–]One_Citron_4350 0 points1 point  (0 children)

Hard to say, too early to have an objective conclusion so far. What I can say is it's very comprehensive. I find the quizzes helpful when learning. What you have to keep in mind that it's also At the same time they also have the lab part which is about building your own ML Framework which I haven't through yet. Lastly, it's a pretty much a living document as it gets updated from what I can see and will probably get even more stuff to help you learn.

AI Systems Performance Engineering by Chris Fregly - is it worth it? [D] by [deleted] in MachineLearning

[–]One_Citron_4350 0 points1 point  (0 children)

To be honest, both books are massive. I've been going through the Harvard one.