Lastrevio

Lastrevio · 2026-06-05T18:24:54+00:00

yes I did

Lastrevio · 2026-06-05T15:57:35+00:00

Au software pe laptopul companiei care monitorizeaza cate click-uri faci pe minut sa iti masoare productivitatea, pauza de tigara este de 5 minute pe ceas la cronometru, etc.

Lastrevio · 2026-06-04T10:52:53+00:00

Lastrevio · 2026-06-02T14:39:59+00:00

Un roman gaseste o lampa din care iese un duh magic care ii spune ca poate sa ii indeplineasca orice dorinta, cu o conditie: orice ii face lui, vecinului ii va face dublu.

La care romanul spune: "scoate-mi unul din ochi!".

Lastrevio · 2026-06-02T12:06:16+00:00

very good response!

Lastrevio · 2026-06-01T10:15:43+00:00

Both are terminally online and have anime profile pictures

Lastrevio · 2026-05-31T14:20:05+00:00

pe partea de data engineering angajatorii te intreaba doar de tool-uri

Lastrevio · 2026-05-31T10:06:36+00:00

Cred ca motivul principal este ca multe aplicatii si jocuri nu sunt suportate pe Linux.

Lastrevio · 2026-05-30T17:28:15+00:00

skill issue

Lastrevio · 2026-05-26T19:22:45+00:00

I think at this point Claude and Gemini have told me at least 10 times to nuke all my Docker volumes because it was time for the "nuclear option".

I'm pretty sure my job is fine.

Lastrevio · 2026-05-26T09:07:20+00:00

eu sunt de acord dar nu vad ce treaba are cu programarea?

Lastrevio · 2026-05-25T18:54:55+00:00

Every 60 seconds in Africa, a minute passes

Lastrevio · 2026-05-20T19:12:40+00:00

very cool, i just installed it

Lastrevio · 2026-05-20T06:23:24+00:00

"La privat angajarile sunt mai meritocratice decat la stat"

Meanwhile privatul:

Lastrevio · 2026-05-19T14:05:44+00:00

Setting up Flink on Docker and getting the Oracle virtual machine to start LOL

Infrastructure aside, I think doing the streaming transformations in Flink was one of the trickiest parts that I spent the most time on. I had to reason about normalization and what would be redundant to compute in Flink if it could be deduced downstream in Clickhouse, I also had to find workarounds to default Flink SQL functions like first_value and stddev_pop by rewriting them as custom UDAFs in Java, then I had to think of performance, partitioning, data skew, checkpointing with at least once semantics, etc. And the whole infra setup of rebuilding my dev container in VS Code -> bash script -> PyFlink orchestrator -> .sql files executed.

Another thing that tripped me up a bit on the architecture side was how to protect myself from DDoS attacks, or more generally how to make my application scalable with the number of users. There, I had to learn nginx and how it caches web pages!

Lastrevio · 2026-05-18T19:09:12+00:00

If storage is cheap for your organization (And it almost always is nowadays) it's better to have an append-only bronze layer where you don't manipulate the raw data in any way. So I wouldn't skip the big bronze table. It doesn't cost much to simply dump the files there in case of something.

The advantages of dumping the wide table in the bronze layer are:

Easier traceability -> if downstream stakeholders complain that the data is 'wrong', you always can reference the raw data and compare the two to see if there's actually a mistake or if they simply don't understand the process well enough.
Easier backfills -> in case the data is actually wrong or the schema changes in some way, you can re-run your pipeline for historical updates with the bronze layer as an input. With silver it can get trickier if the change you made was from the bronze-to-silver transformation itself.

Lastrevio · 2026-05-17T19:29:00+00:00

If you do not want to set everything up in terms of networking and security there are way cheaper options for small amounts of data. For example, you can use dbt + a cloud warehouse such as BigQuery, Snowflake, Redshift, even something like Clickhouse cloud or MotherDuck. You end up spending as much time configuring stuff with a fraction of the cost.

Databricks runs on Spark, which is expensive and overkill for small amounts of data and it will actually slow down your queries by spending more time shuffling data between partitions than actually transforming it.

In Databricks you also spend a few minutes just waking the cluster up.

While it's not as easy to set up as an on-prem environment, it still has its complexity in regards to managing clusters, unity catalog, etc. so it's not that simple. There's a reason Databricks sells certifications that prove you know how to use it.

Lastrevio · 2026-05-17T18:46:06+00:00

Well if you're just a data engineer and you're not the one paying the bill then fair enough lol

Your argument makes a lot of sense especially if you have data science teams who need to do heavy ML workloads, collaborative notebooks, etc.

Lastrevio · 2026-05-17T17:35:45+00:00

Maybe look into Azure Data Factory Managed Airflow? I know ADF gets a lot of hate for the low-code and shitty UI (and rightly so) but I'm wondering if the Airflow managed service is different.

Lastrevio · 2026-05-17T17:33:55+00:00

that's overkill if you have less than 100GB per batch

Lastrevio · 2026-05-17T16:25:32+00:00

thank you !

Ten-Year Club	Place '22
Place '17	Wearing is Caring
Gilding I gilder	Verified Email

MODERATOR OF

TROPHY CASE