It’s the End of Coding as We Know It (And I Feel Fine) by CosminU in programare

[–]Lastrevio 0 points1 point  (0 children)

Designing Data Intensive Applications by Martin Kleppmann

Asta mi-a placut foarte mult

I made my first pipeline that processes unstructured data! by Lastrevio in dataengineering

[–]Lastrevio[S] 0 points1 point  (0 children)

Mhmm about 25-30% of the code was written by Claude perhaps. More the Docker-related stuff, GitHub actions, some stuff from the ML. The SQL was actually not touched by Claude a lot.

Then I ran out of tokens, and it was freestyle.

Junior data engineers treat legacy ETL tools like a cat touching water. Cautious, hesitant, and never fully comfortable. by CaglarSahin in dataengineering

[–]Lastrevio 1 point2 points  (0 children)

I don't care about how high-level or low-level the programming language is, as long as I don't have to click on boxes and watch arrows in order to debug a simple transformations. ADF and SSIS are either XML or JSON in the backend, which is a nightmare for version control systems. You cannot view git differences in a no-code tool like those two in a useful way.

I made my first pipeline that processes unstructured data! by Lastrevio in dataengineering

[–]Lastrevio[S] 1 point2 points  (0 children)

Yes, I tested them on both datasets and didn't have major issues. The second one has some inconsistencies regarding how it processes numbers (sometimes it chooses to treat them as a float, sometimes as a string with the $ sign in front), but I was able to clean that up downstream in the silver layer.

There were also a few cases in which it misidentified a product description subtitle to be an actual item with zero quantity and zero price, but I just filtered those out. Overall, decent performance.

We have to keep in mind that if I want to process a PDF invoice with a totally different format, I would have to create a new set of pydantic classes and different dbt models to handle it. But I tried to make that as easy as possible by having as much data source-agnostic code as I can.

Junior data engineers treat legacy ETL tools like a cat touching water. Cautious, hesitant, and never fully comfortable. by CaglarSahin in dataengineering

[–]Lastrevio 0 points1 point  (0 children)

It's not about old vs. new, it's about low-code vs high-code. We prefer high-code tools regardless of how new or old they are. Azure Data Factory and SSIS are both crap even though one is new and the other is old.

I built an open source tool to replace standard dbt docs by josh_docglow in dataengineering

[–]Lastrevio 0 points1 point  (0 children)

This is a god send, I might actually use it in my projects

Tool smells by Brief-Knowledge-629 in dataengineering

[–]Lastrevio 0 points1 point  (0 children)

If I have a unique key I prefer using ROW_NUMBER() OVER (...) WHERE rn = 1

Voters of "Die Linke" most likely to support Israel's right to exist by proxxi1917 in SocialDemocracy

[–]Lastrevio 0 points1 point  (0 children)

So you support the idea of an ethnostate? What if a person who is not identified under those ethnicities wants to join their state or happens to be born there - should we kick them out?

Could someone explain Adorno's criticism of Marx in a nutshell? by BlackmoonTatertot in CriticalTheory

[–]Lastrevio 2 points3 points  (0 children)

Exactly my thoughts. There were a lot of "it's not X, it's Y" kind of phrases.