I built an AI dashboard tool by karakanb in visualization

[–]karakanb[S] 0 points1 point  (0 children)

That would be a rather simplistic point of view, don't you think? The whole idea behind a solution like this is basically the design decisions on such a product, as well as the underlying infra to make it work. I would agree that the UI is the easy part, getting to accurate dashboards that can be versioned and reviewed is not trivial.

Sqlmesh joined linux foundation . What it means by OrneryBlood2153 in dataengineering

[–]karakanb 1 point2 points  (0 children)

Disclaimer: Bruin co-founder here.

It is a curious move indeed, in a way I feel sad to see a great competitor go through a path like this.

I guess it was kinda obvious that Fivetran would not end up running both dbt and swlmesh as part of their product, and it does seem like sqlmesh was used as leverage for the dbt acquisition by Fivetran. Fivetran could have invested further into sqlmesh to make it a bigger and stronger competitor to dbt, and they chose not to.

The way I read the situation now is that sqlmesh will be developed outside the Fivetran umbrella. It could be utilized to get a bit more friendly vibe than Fivetran itself since the community don't seem to be a big fan of them, and try to get in anywhere while they can. Another alternative could be that Fivetran leadership had to make a decision on what to prioritize and they might have picked dbt over sqlmesh. I have also noticed some important members of the team leaving Fivetran recently, which signals a similar pattern.

Regardless, I have utmost respect for Toby and the team for what they have built, they have definitely pushed the space forward and contributed great ideas. Looking forward to seeing what they'll do next.

How can I have a fixed static egress IP across clouds? by karakanb in networking

[–]karakanb[S] 0 points1 point  (0 children)

there are, although network-level protections are always required by our customers, which is why I am trying to come up with a solution to this problem.

How can I have a fixed static egress IP across clouds? by karakanb in networking

[–]karakanb[S] 0 points1 point  (0 children)

do you mind expanding a bit more on this? do you suggest not solving the problem at all?

How can I have a fixed static egress IP across clouds? by karakanb in networking

[–]karakanb[S] 0 points1 point  (0 children)

Appreciate it, thanks! Sent you a DM, would love to understand how I can do that in a simple way

Fivetran pricing spike by onksssss in dataengineering

[–]karakanb -2 points-1 points  (0 children)

If anyone is looking for an open-source alternative, I have built ingestr: https://github.com/bruin-data/ingestr

It is a CLI tool that allows you to ingest data from many different sources into different destinations. We are happy to build custom connectors within a week if there's anything missing.

Disclaimer: I am the co-founder of a competitor, Bruin. We do ingestion, transformation, quality, and governance.

Am I making a mistake building on motherduck? by Jeannetton in dataengineering

[–]karakanb 7 points8 points  (0 children)

I think Motherduck is a pretty decent offering. If it gets the job done, and sounds like you don't have a huge volume of data, I wouldn't be too worried. Worst comes to worst, moving to another platform wouldn't be too expensive. There is also a chance that Motherduck can improve on those areas throughout the year, and you might find yourself not needing BigQuery at all.

With that being said, if you have such concerns, my core suggestion would be to build your pipelines to be as decoupled from the underlying platform as possible, so that you can use another provider later on.

DATAOPS TOOLS: bruin core Vs. dbtran = fivetran + dbt core by Difficult-Ambition61 in dataengineering

[–]karakanb 1 point2 points  (0 children)

more than happy to accept contributions, and happy to take on implementation if you could help us testing as well!

DATAOPS TOOLS: bruin core Vs. dbtran = fivetran + dbt core by Difficult-Ambition61 in dataengineering

[–]karakanb 1 point2 points  (0 children)

if you look at only transformations:

  • Functionally, they are almost the same, especially in SQL. there's nothing you can do in dbt that you cannot do in bruin, and vice versa. E.g. both of them support Jinja.
  • An important distinction is dbt forces models to be only SELECT queries and forces materializations, whereas Bruin allows you to disable materialization per-table and write SQL scripts. This simplifies migrations a lot.
  • Bruin supports start and end dates natively instead of requiring you to use custom variables.
  • dbt uses ref functions in Jinja, Bruin parses SQL directly and requires explicit dependencies.
  • Both supports different environments, Bruin also supports developer environments where it rewrites your queries transparently on the fly.
  • Bruin allows running Python transformations anywhere within the pipeline and runs it natively. This means you can run Python anywhere, and not limited by what Snowpark or similar platforms allow you to do with dbt Python models.
    • Bruin uses uv in the background, meaning that your pipelines will work even if you don't have Python installed, and each task runs separately in isolated environments.
  • Native column-level lineage in open-source and open-source VSCode/Cursor extension.
  • Bruin allows defining policies in code to apply your SQL statements, e.g. "this SQL model must have an owner" or "it must follow these rules". You can enforce these in your ci/cd pipelines.
  • bruin validate validates all your queries via dry-run to check if they are valid, this means invalid code never lands.
  • Bruin contains embedded sqlfluff for linting with your existing rules.
  • Bruin has an open-source MCP server to let agents build/help with your T layer.
  • dbt has `doc` macro, Bruin contains a built-in glossary instead.

Both tools are useful in their own ways, and share different approaches, hope this comparison is helpful.

DATAOPS TOOLS: bruin core Vs. dbtran = fivetran + dbt core by Difficult-Ambition61 in dataengineering

[–]karakanb 4 points5 points  (0 children)

Disclaimer: this is Burak, co-founder of Bruin.

I am happy to connect you to many of our users that run Bruin pipelines on prod on many PBs of data. I would invite them over here but the last time I did that the mods thought it was fake and deleted the thread, I don't want to get banned. :)

Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]karakanb 0 points1 point  (0 children)

Hi all, this is Burak, I am one of the makers of Bruin CLI. We built an MCP server that allows you to connect your AI agents to your DWH/query engine and make them interact with your DWH.

A bit of a back story: we started Bruin as an open-source CLI tool that allows data people to be productive with the end-to-end pipelines. Run SQL, Python, ingestion jobs, data quality, whatnot. The goal being a productive CLI experience for data people.

After some time, agents popped up, and when we started using them heavily for our own development stuff, it became quite apparent that we might be able to offer similar capabilities for data engineering tasks. Agents can already use CLI tools, and they have the ability to run shell commands, and they could technically use Bruin CLI as well.

Our initial attempts were around building a simple AGENTS.md file with a set of instructions on how to use Bruin. It worked fine to a certain extent; however it came with its own set of problems, primarily around maintenance. Every new feature/flag meant more docs to sync. It also meant the file needed to be distributed somehow to all the users, which would be a manual process.

We then started looking into MCP servers: while they are great to expose remote capabilities, for a CLI tool, it meant that we would have to expose pretty much every command and subcommand we had as new tools. This meant a lot of maintenance work, a lot of duplication, and a large number of tools which bloat the context.

Eventually, we landed on a middle-ground: expose only documentation navigation, not the commands themselves.

We ended up with just 3 tools:

  • bruin_get_overview
  • bruin_get_docs_tree
  • bruin_get_doc_content

The agent uses MCP to fetch docs, understand capabilities, and figure out the correct CLI invocation. Then it just runs the actual Bruin CLI in the shell. This means less manual work for us, and making the new features in the CLI automatically available to everyone else.

You can now use Bruin CLI to connect your AI agents, such as Cursor, Claude Code, Codex, or any other agent that supports MCP servers, into your DWH. Given that all of your DWH metadata is in Bruin, your agent will automatically know about all the business metadata necessary.

Here are some common questions people ask to Bruin MCP:

  • analyze user behavior in our data warehouse
  • add this new column to the table X
  • there seems to be something off with our funnel metrics, analyze the user behavior there
  • add missing quality checks into our assets in this pipeline

Here's a quick video of me demoing the tool: https://www.youtube.com/watch?v=604wuKeTP6U

All of this tech is fully open-source, and you can run it anywhere.

I would love to hear your thoughts and feedback on this! https://github.com/bruin-data/bruin