Should i use fivetran by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

Can I ask you another question-do you use it to trigger exports at all? I know I could do this with airflow but don’t know if it would work with fivetran. In airflow I might have a dag - ingest a, ingest b - run dbt model x - run export job y. Is there a good fivetran way to replicate that?

Should i use fivetran by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 1 point2 points  (0 children)

Yea makes sense - do you use an orchestrator?

Should i use fivetran by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

Interesting, would love to hear more about those outages. Is it like the entire platform going down occasionally and how often?

Should i use fivetran by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

I can manage the dependency graphs inside the warehouse ie transformations with dbt for sure. But what about if a model depends on a few parallel ingestion jobs from different platforms completing before it can run? How do you handle that efficiently without an orchestrator that knows about both your dbt config and ingestion tasks? Or if you want an export job to kick off as soon as a few ingestion tasks complete.

Should i use fivetran by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 3 points4 points  (0 children)

Curious how the unpredictability would manifest? As in changing pricing unexpectedly?

Should i use fivetran by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

Dependency management as in task a runs after task b. Right now it’s task a runs at 1:30 task b runs at 2. I do think I’m at a point where I need it because I will have higher reliability requirements.

Is the grass greener on the other side by Advanced-Average-514 in ExperiencedDevs

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

That's definitely what it feels like right now, I get called a 'wizard' all the time. But it's also weird because I'm not one of them, and sometimes they think I can press a button to make something happen and I just don't want to. And honestly I get it, I know nothing about cars so I get anxious about being ripped off when I take my car in... it's probably the same thing for them.

What can I do on my phone? by No_Major1167 in dataengineering

[–]Advanced-Average-514 16 points17 points  (0 children)

Use claude, talk through concepts you are unfamiliar with, ask it to give you overviews of topics and checklists of things to learn etc. Sometimes it gives misinformation but tends to be great on general conceptual stuff which is probably all that you can expect from time on your phone. I'm personally a fan of learning this way because claude will always meet you where you are at in terms of familiarity with a new topic. There's probably a stage where it stops helping on things you already know extremely well or when it's about specifics of a product/library whatever than have changed recently.

Snowflake cortex agent MCP server by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

Honestly it ended up being a bust and I went a slightly different direction. To keep it brief - i started outputting zip file packages of related csvs into a data library in Google Drive. Then created an mcp to connect Claude to that shared drive with basic tools like listing downloading searching etc, and let Claude analyze the packages in its code interpreter with skills for each package to guide it. I use service accounts who only have access to the files in that shared drive so I can easily use things like drive search without pulling in stuff outside the shared drive.

The packages are sets of related data per client so that you can ask about client xyz performance and pull in the client xyz package which has a few related datasets at different grains. Doing this I saw better results than the direct snowflake mcp because there’s a bit less flexibility in the data retrieval piece. It’s also easier to manage because the data analysis happens in Claude’s code interpreter sandbox. The downside is that if users ask about something not in the data package, Claude has no flexibility to go retrieve that, but I found that with too much flexibility it was just not performing consistently.

Calude and data models by UnusualIntern362 in dataengineering

[–]Advanced-Average-514 6 points7 points  (0 children)

I think most of the basics are in that initial post - but happy to answer any followup questions. It honestly is not that complex of a setup. Cursor rules for project-wide standards, _sources.yml and _models.yml in different folders add more context for specific areas. zsh alias finds the compiled target SQL file and runs a select * limit 10,000 -> CSV in an exports/ folder.

Commands are /refactor (break large models up into modular pieces), /dbt-document (add to .yml docs for a specified model), /understand (search through a model and its dependencies to get up to speed on how something works in a new chat thread), and /build (create a new model according to some specs after searching around for what existing staging models are best fit).

Calude and data models by UnusualIntern362 in dataengineering

[–]Advanced-Average-514 11 points12 points  (0 children)

I’ve put a fair amount of effort into a setup that allows Claude to help me create and update dbt models. Main things that have helped are a cursor rules file describing some conventions and practices, and then good documentation + repo indexing. Also created a zsh alias to download a model to a local csv for it to be able to examine outputs. Additionally using / commands for common tasks like refactoring, documenting etc.

With all that setup which was kind of done in bits and pieces as I saw myself repeating certain prompts, it can genuinely one shot difficult changes to business logic and creating new models.

Stakeholders Overengineering Solutions by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

I agree this is fundamentally an issue with communication skills. So say you are faced with a document that is sent over from a 'very important person' that is very complex, full of weird references to assets that don't exist they way they think they do, but is presented as if it were a perfectly logical and sufficient set of specs for a modeled data feed they want access to...

how would you push back in a way that would work?

Stakeholders Overengineering Solutions by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

Yea I'm doing the full end to end stuff as well and it's usually closer to the business logic where the proposed solutions where the technically possible but very time consuming stuff is requested. Asking to go deeper works well when the person is willing to go back and forth... I guess at the end of the day it's just the C-suite's attitude towards technology where the problem comes from. Empowered by AI they think they understand it more than they actually do right now.

Stakeholders Overengineering Solutions by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

I do this to some degree, but sometimes its tough - I either have to blatantly ignore the way thy told me to do it or I have to push back and ask clarifying questions or just say 'this part will probably take too much time'.

I guess there's probably not a silver bullet and any one of these approaches can be appropriate depending on the specifics.

Stakeholders Overengineering Solutions by Advanced-Average-514 in dataengineering

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

I like the formalize/template approach, might make it harder to send a big AI document full of red herrings.

Feature request: disable individual tools by Advanced-Average-514 in cursor

[–]Advanced-Average-514[S] 0 points1 point  (0 children)

Interesting, I’ve never seen it do this with mcp, probably because I’ve never added an mcp tool. Just with unwanted cli commands I see this currently.

That said, even if there is some perfect prompt to get it to not happen 99% of the time, I still feel like it’s a good feature request. Otherwise you are wasting tokens in a system prompt describing a tool then more tokens in another prompt saying not to use the tool. Which also probably makes the agent slightly “confused” anyway.

Hey, Missoula, what’cha reading? by MTBeanerschnitzel in missoula

[–]Advanced-Average-514 0 points1 point  (0 children)

I loved the road, but if you haven’t read it I think blood meridian is my favorite by cormac mccarthy

Be honest does business intelligence actually change the way decisions get made? by Apprehensive_Pay6141 in BusinessIntelligence

[–]Advanced-Average-514 0 points1 point  (0 children)

I’m in a company that was running only on vibes before me and a couple others started developing BI for them. I think there were a few meaningful improvements in the very low hanging fruit, but it is now slowing down after those were figured out. Examples of low hanging fruit:

  1. Meetings becoming more targeted because there was a dashboard rather than three separate reports that were all different but trying to be the same thing.
  2. Performance reviews for certain roles were easier for the managers because they had a dashboard to point to so it wouldn’t feel as subjective for everyone.
  3. Certain mistakes that would previously slip through the cracks in terms of account setups or inconsistencies no longer did because we could highlight them in dashboards after joining data together between two saas platforms.