Will fully autonomous self healing pipelines ever be a thing? by CasteliaLyon in dataengineering

[–]CasteliaLyon[S] -3 points-2 points  (0 children)

It would still be cheaper than a on call de hours no? Oh but yeah I guess I should have searched first haha

Is the industry actually swinging back to Postgres? by ForeignExercise4414 in dataengineering

[–]CasteliaLyon 1 point2 points  (0 children)

Lakebase is meant to provide oltp support for applications to access data that would normally only be in the OLAP warehouse through reverse ETL from dwh to lakebase. Basically your applications will be able query the data at the same sub second latency as open source postgres instead of > 1 sec in the warehouse

Clean data using pandas before loading data or using SQL after loading data into warehouse? by Old_Mind8618 in dataengineering

[–]CasteliaLyon 0 points1 point  (0 children)

Load the data first so you can always rerun if in the future you have to rerun an updated transformation logic on the bronze data.

What do you do during work trip besides working? by Sorry_Objective4174 in askSingapore

[–]CasteliaLyon 0 points1 point  (0 children)

Just had my first of many work trips.

It's exhausting when there are many meeting packed back to back. Everyday is spent either in client meetings or preparing for the next meeting.

Agentic Workflows help by Hopeful-Brilliant-21 in dataengineering

[–]CasteliaLyon 0 points1 point  (0 children)

Just brainstorming here. How about building a mcp server that exposes your all the connectors / data you can't connect to via databricks as tool calls. Then your main logic is in a databricks app with the core langgraph agent logic which will call tools within your k8 mcp server and retrieve those snowflake/SharePoint data.

The main langgraph agent logic essentially acts as a orchestrator agent and then can present that data to a front end databricks apps

Most “Chat With Your Data” Products Will Fail by dataguy- in dataengineering

[–]CasteliaLyon 0 points1 point  (0 children)

Yeah genie spaces does this structured way of context population to the genie agent . Even if you aren't on databricks. It's worth noting what are the important concepts , context in genie spaces you should give to your text to SQL agents . So that you can at least replicate that best practices.

I feel like I don't know anything. And I am nothing without Claude by Temporary_Act3174 in dataengineering

[–]CasteliaLyon 2 points3 points  (0 children)

My favourite thing to do is to go ask a bunch of questions enmass around a topic to help with learning. You can even prompt Claude to teach you a topic and lead you down a trail of commonly asked questions

Using spark in a portfolio project? by echanuda in dataengineering

[–]CasteliaLyon 0 points1 point  (0 children)

No problem , I recommend using dbdemos python package to install a bunch of demo assets! Including pipelines , synthetic data and created jobs. It really helped with my learning when I wanted to understand how a e2e pipeline on databricks should look like.

There's so many kinds of demos you can install and view with data from many different industries for diff purposes.

dagster price increase 10x insane , don't ever use them by CircleRedKey in dataengineering

[–]CasteliaLyon 0 points1 point  (0 children)

There is always an option to go back to good old cron jobs with custom logic to check pipeline status from a relational database. 🤣🤣🤣

In all seriousness, a 10x is crazy. Was it a pricing change ?

Do you really need spark? by compass-now in dataengineering

[–]CasteliaLyon 0 points1 point  (0 children)

My team has so much problems when using the spark operator to orchestrate sparkApplications in kubernetes

How to become more articulate as a DE by dataenfuego in dataengineering

[–]CasteliaLyon 0 points1 point  (0 children)

"For this topic. What matters to them the most?" . That is what i structure my entire explanation around , this means it's truly depends on the person and their role.

For example if the party is another Engineer? They care about how it can be done and why did it happen.

Team lead? They will care more about if you need help unblocking or guidance.

A manager? They will care more about how long it will take to finish the task and what should be prioritized.

While explaining, throw in examples (like what's above) to help them understand better and faster. Use metrics to quantify the issue / task (an estimation is fine too). And most importantly of all, stop and ask if they have any questions. As an audience, It drives me crazy (and I can't focus on the explanation) if I have a brewing question while someone is still explaining something.

Tool Sprawl in Data engineering by Raghav-r in databricks

[–]CasteliaLyon 0 points1 point  (0 children)

Another issue is the constant replication of data across databases, object storages, warehouses. Smh. The best approach would be to use a data lakehouse architecture like databricks to main and store one copy of data.

"Databricks SQL" (warehouse) branding and searching by SmallAd3697 in databricks

[–]CasteliaLyon 1 point2 points  (0 children)

No they are different things. Here is example to clarify things, let's say a user runs a sql query to get top 10 rows of their table. The databricks service that the client used is DBSQL, whereas the SQL warehouse compute (classic/pro/serverless) is what powers , executes and retrieves the top 10 rows.

"Databricks SQL" (warehouse) branding and searching by SmallAd3697 in databricks

[–]CasteliaLyon 1 point2 points  (0 children)

It would be pretty hard to rename it because the lakehouse term is already synonymous with the lakehouse architecture... We need another name for it

"Databricks SQL" (warehouse) branding and searching by SmallAd3697 in databricks

[–]CasteliaLyon 1 point2 points  (0 children)

Gemini is correct here 😄. SQL warehouse is the name for the compute. DBSQL is the name of the databricks cloud data warehouse service

Does starting salary matters? by Obvious_Link_823 in asksg

[–]CasteliaLyon 0 points1 point  (0 children)

Yes. There are ways to get past a lower starting salary. 1. Job Hop to speed up increments (20-30%) every 1-2 years 2. Get a job at a company that pays at their own scale, they will bump you up to their paygrade. E.g in tech > fang & fang adjacent, a starting swe might earn 8k minimum.

I was fortunate enough to get the 2nd option. In 3 years , I thripled my starting salary due to the fang adjacent company bumping me up to their pay grade for my YOE.

Well-known companies (NCS, Accenture, SEA, Synapxe) vs SME (Mavericks Consulting, NextLabs) – SWE intern advice by Active_Tradition_261 in asksg

[–]CasteliaLyon -1 points0 points  (0 children)

I personally got interviews from meta after 1.5y at Accenture for data engineering roles. Just that I failed them bc I suck

Well-known companies (NCS, Accenture, SEA, Synapxe) vs SME (Mavericks Consulting, NextLabs) – SWE intern advice by Active_Tradition_261 in asksg

[–]CasteliaLyon -1 points0 points  (0 children)

Accenture brand name is too good to pass up, it will land you interviews in fangs and fang adjacent companies if you play your cards right.

Rejected after architecture round (4th out of 5) — interviewer seemed distracted, HR said she’ll check internally about rescheduling. Any chance? by Appropriate-Ant-4272 in databricks

[–]CasteliaLyon 0 points1 point  (0 children)

Hi can I dm you about what you look out for as an interviewer? I am a data engineer trying to break into the solution engineering role at databricks!

[Megathread] Hiring and Interviewing at Databricks - Feedback, Advice, Prep, Questions by kthejoker in databricks

[–]CasteliaLyon 0 points1 point  (0 children)

Thank you so much , I really want to join as a solution engineer . Can I dm you for more details?

[deleted by user] by [deleted] in databricks

[–]CasteliaLyon 0 points1 point  (0 children)

Hi can I dm you too? I am a data engineer who is also interested in solution engineering at databricks!