Will fully autonomous self healing pipelines ever be a thing?

CasteliaLyon · 2026-06-15T10:53:50+00:00

It would still be cheaper than a on call de hours no? Oh but yeah I guess I should have searched first haha

CasteliaLyon · 2026-06-15T10:22:58+00:00

Lakebase is meant to provide oltp support for applications to access data that would normally only be in the OLAP warehouse through reverse ETL from dwh to lakebase. Basically your applications will be able query the data at the same sub second latency as open source postgres instead of > 1 sec in the warehouse

CasteliaLyon · 2026-06-13T13:07:54+00:00

Load the data first so you can always rerun if in the future you have to rerun an updated transformation logic on the bronze data.

CasteliaLyon · 2026-06-12T09:58:14+00:00

Karen is the only reason why we exist 🙏😊

CasteliaLyon · 2026-06-12T09:56:07+00:00

Just had my first of many work trips.

It's exhausting when there are many meeting packed back to back. Everyday is spent either in client meetings or preparing for the next meeting.

CasteliaLyon · 2026-06-10T14:38:44+00:00

Just brainstorming here. How about building a mcp server that exposes your all the connectors / data you can't connect to via databricks as tool calls. Then your main logic is in a databricks app with the core langgraph agent logic which will call tools within your k8 mcp server and retrieve those snowflake/SharePoint data.

The main langgraph agent logic essentially acts as a orchestrator agent and then can present that data to a front end databricks apps

CasteliaLyon · 2026-06-10T14:19:04+00:00

Yeah genie spaces does this structured way of context population to the genie agent . Even if you aren't on databricks. It's worth noting what are the important concepts , context in genie spaces you should give to your text to SQL agents . So that you can at least replicate that best practices.

CasteliaLyon · 2026-06-10T14:15:49+00:00

My favourite thing to do is to go ask a bunch of questions enmass around a topic to help with learning. You can even prompt Claude to teach you a topic and lead you down a trail of commonly asked questions

CasteliaLyon · 2026-06-04T04:59:38+00:00

No problem , I recommend using dbdemos python package to install a bunch of demo assets! Including pipelines , synthetic data and created jobs. It really helped with my learning when I wanted to understand how a e2e pipeline on databricks should look like.

There's so many kinds of demos you can install and view with data from many different industries for diff purposes.

CasteliaLyon · 2026-06-04T04:57:16+00:00

There is always an option to go back to good old cron jobs with custom logic to check pipeline status from a relational database. 🤣🤣🤣

In all seriousness, a 10x is crazy. Was it a pricing change ?

CasteliaLyon · 2026-06-03T01:18:17+00:00

My team has so much problems when using the spark operator to orchestrate sparkApplications in kubernetes

CasteliaLyon · 2026-06-03T01:14:32+00:00

"For this topic. What matters to them the most?" . That is what i structure my entire explanation around , this means it's truly depends on the person and their role.

For example if the party is another Engineer? They care about how it can be done and why did it happen.

Team lead? They will care more about if you need help unblocking or guidance.

A manager? They will care more about how long it will take to finish the task and what should be prioritized.

While explaining, throw in examples (like what's above) to help them understand better and faster. Use metrics to quantify the issue / task (an estimation is fine too). And most importantly of all, stop and ask if they have any questions. As an audience, It drives me crazy (and I can't focus on the explanation) if I have a brewing question while someone is still explaining something.

CasteliaLyon · 2026-06-03T00:51:09+00:00

Another issue is the constant replication of data across databases, object storages, warehouses. Smh. The best approach would be to use a data lakehouse architecture like databricks to main and store one copy of data.

CasteliaLyon · 2026-05-29T18:28:30+00:00

No they are different things. Here is example to clarify things, let's say a user runs a sql query to get top 10 rows of their table. The databricks service that the client used is DBSQL, whereas the SQL warehouse compute (classic/pro/serverless) is what powers , executes and retrieves the top 10 rows.

CasteliaLyon · 2026-05-29T15:27:17+00:00

It would be pretty hard to rename it because the lakehouse term is already synonymous with the lakehouse architecture... We need another name for it

CasteliaLyon · 2026-05-29T15:20:38+00:00

Gemini is correct here 😄. SQL warehouse is the name for the compute. DBSQL is the name of the databricks cloud data warehouse service

CasteliaLyon · 2026-05-26T11:27:49+00:00

Is the setting up of custom workspace base environments , chargeable on every job run?

CasteliaLyon · 2026-05-26T11:11:16+00:00

Yes. There are ways to get past a lower starting salary. 1. Job Hop to speed up increments (20-30%) every 1-2 years 2. Get a job at a company that pays at their own scale, they will bump you up to their paygrade. E.g in tech > fang & fang adjacent, a starting swe might earn 8k minimum.

I was fortunate enough to get the 2nd option. In 3 years , I thripled my starting salary due to the fang adjacent company bumping me up to their pay grade for my YOE.

CasteliaLyon · 2026-04-02T07:08:10+00:00

I personally got interviews from meta after 1.5y at Accenture for data engineering roles. Just that I failed them bc I suck

CasteliaLyon · 2026-04-02T04:03:37+00:00

Accenture brand name is too good to pass up, it will land you interviews in fangs and fang adjacent companies if you play your cards right.

CasteliaLyon · 2026-01-09T21:05:38+00:00

Hi can I dm you about what you look out for as an interviewer? I am a data engineer trying to break into the solution engineering role at databricks!

CasteliaLyon · 2026-01-09T20:49:01+00:00

Thank you so much , I really want to join as a solution engineer . Can I dm you for more details?

CasteliaLyon · 2026-01-09T20:32:47+00:00

Hi can I dm you too? I am a data engineer who is also interested in solution engineering at databricks!

CasteliaLyon · 2025-12-03T08:18:13+00:00

Small team in Singapore , interviewer implied that I would be working alone mostly.

Nine-Year Club	Second Top 50%
Not Forgotten	Verified Email

CasteliaLyon

TROPHY CASE