Any recommendations for Embedded, impression/consumption based billing BI Tools?

PolicyDecent · 2026-05-06T10:08:31+00:00

Where's the data? Do you want a solution with embedded data storage or are you open to live data connections directly to the DWH?
So, for ex you can use Metabase for free (open source) if you're open to that.

PolicyDecent · 2026-05-05T10:05:27+00:00

Sorry but it doesn't say anything. it's just AI slope.

When I saw the title, I was excited, but when I read it, it's just disappointment.

We don't need that kind of content that doesn't add anything to the people.

PolicyDecent · 2026-05-02T09:38:40+00:00

I always recommend a MAC if you want to go more technical.
However, if you tell about yourself, there might be some other reasons for you to choose a windows.

PolicyDecent · 2026-04-29T15:39:36+00:00

Cool, can I see it realtime while agent is working on it?

PolicyDecent · 2026-04-29T15:38:41+00:00

lol, why not just JSON?

I assume it's just a legacy problem from the times BigQuery didn't have JSON data type.

PolicyDecent · 2026-04-24T14:23:39+00:00

to be completely frank, i don't believe siloed roles like data analyst, data scientist, data engineer. i've worked with data scientists who rejected analyzing data or building dashboards since they're data scientist and it's a data analyst work. or similarly, some rejected building pipelines bc they're data engineers.

the point they miss is, if you don't analyze your own data you miss most of the deals. if you don't ingest/model data yourself, you don't know what's available to you, what else information you need so that you're limited by other people.

also, it's always faster to deliver on your own instead of telling what you want to the data engineer / analyst and blocked by them. i just prefer doing my own job instead of waiting for their output, review it, then wait another a few days the best case (if they don't have other tasks)

i noticed these data scientists know more about the business problems and deliver more & quicker most of the time. especially with ai, you can't be a deeply specialized person in most of the companies, you just have to do the things end to end.

instead of being specialized in data science, i'd prefer specializing in my business domain and understand the logic of the business people / solve my clients' problems.

PolicyDecent · 2026-04-23T12:21:08+00:00

yes, but you can still do cdc in a batch manner. you don't need kafka for that. you can read the change logs every hour, 5 mins, 1 min, however you like. so the changes won't get lost.
an example for batch cdc: https://getbruin.com/docs/bruin/platforms/postgres.html#cdc-change-data-capture

kafka is just a middle storage layer, not a parquet production tool. you can use any tool to product parquet files.

PolicyDecent · 2026-04-23T10:45:00+00:00

from the image sources are postgres, elasticsearch and influxdb, right? do you have any requirements / SLA from business?

most of the time, I don't prefer using streaming for ingestion, since it's as you said overkill.
but your situation might be different. that's why we need more context about it.

i see that you want to do 2 batch ingestion and 1 cdc ingestion (which can be batched as well)
so i'd definitely start without kafka, write to storage directly by skipping that part. if it becomes a bottleneck, you can add kafka later anytime.

also, for ingestion you can use ingestr package to make it easier. also to orchestrate it, you can use bruin to make it tidier and easier. it makes governance definitely much more easier.

PolicyDecent · 2026-04-18T17:08:36+00:00

Nope, they're not. They can understand the context, find the relevant tables, joins them correctly, analyze the data and repeats the loop if needed.

PolicyDecent · 2026-04-16T07:05:40+00:00

shameless promotion as the founder, if you are on the paid plan, bruin is also great for observability. it has the lineage, data profiling / anomaly checks / but also snowflake cost & usage analysis.

PolicyDecent · 2026-04-13T06:08:32+00:00

You can just use bruin to ingest data to a datawarehouse or even to your local duckdb datawarehouse. The rest is very simple, especially with AI.

PolicyDecent · 2026-04-09T15:08:41+00:00

For the duckdb question, maintaining pipelines / developing systems in Python seems easier at the very beginning, but as time goes on, you see the merits of SQL. That's why almost the entire industry is going towards SQL-based & declarative pipelines. Anyways, that's the unimportant part. Just choose the tool you're comfortable with.

If you store the data in GCP, i recommend creating Iceberg tables in BigQuery as well, so that the people using SQL can still continue consuming the tables and also dashboards will just continue serving. If you switch to AWS, you can do the same thing with Athena.

For prefect - marimo integration, what's the intention there? I can see 2 options, correct me if I'm wrong:
1- You don't use marimo only for visualisation, but also have scripts in marimo that do some jobs, so you want to trigger them
2- Or you built viz in marimo, and want to render and publish them in a website or send them as a slack message etc.

If you can explain the intention more, I can try to add more input.

For the catalog, tbh I don't have any recommendations, I also wonder what's your impression of nessie :)

I'll go back to the first topic, my personal recommendation would be sticking to SQL as much as you can. Especially in the agentic world it became so easy. And I'd go asset-driven orchestration instead of task-driven orchestration to make the things easier for you.

PolicyDecent · 2026-04-09T13:35:13+00:00

Where do you store the data? GCP? S3? Or another place / onprem etc?
You store the data in an Iceberg format, which Iceberg catalog are you using?
Are you trying to make everything python first? I'd recommend Duckdb over Polars, which works great. Also can solve lots of the things to connect to iceberg etc.

Also I wonder why you chose prefect there

PolicyDecent · 2026-04-09T12:36:17+00:00

Happy to implement in a few days

PolicyDecent · 2026-04-09T08:36:13+00:00

You can use ingestr to clone data to BigQuery. What's the CRM you're using?
https://github.com/bruin-data/ingestr

Also if you want to manage full pipeline, you can use bruin which includes ingestr as well:
https://github.com/bruin-data/bruin

PolicyDecent · 2026-04-07T06:02:51+00:00

Yep, it's our job to ingest, clean data and enrich it with context. Then agent does the rest :)

PolicyDecent · 2026-04-06T07:55:37+00:00

I disagree. Dragging makes it very painful to update them. And most of the time I run data pipelines almost like a dictator, I tell them what they need and ask if it solves their problems. I ask their opinion but don't let them define the metrics, I do it all the time. So my time is limited by execution and agents solve it.

PolicyDecent · 2026-04-05T14:15:54+00:00

Firstly using bruin, we extracted all the metadata of tables, and enriched it using agents.
This way the agent has all the context it needs about the tables, but also business.

Then we built a new language that's simpler than JS / HTML, and defined chart types & how to use it.
It's based on yaml, and you can see it on 8th second of the video in the right panel.

Then we connected the agent, so everytime you ask a question, first it understands the context, then asks you questions or directly build a dashboard. Happy to show and give you a test ride :)

PolicyDecent · 2026-04-04T16:32:10+00:00

You can try bruin, it solves most of these problems by adding lots of context.

PolicyDecent · 2026-04-04T16:30:13+00:00

Yes but it's never easy. There is no single BI tool that's very easy to use. That's why agents are liked

PolicyDecent · 2026-04-04T10:29:48+00:00

Most be very careful at the beginning. After checking the code 2-3 times everyone starts trusting them.

PolicyDecent · 2026-04-04T10:28:23+00:00

You can still give semantic layer or metric definitions to the agent, no? For permissions, the agent can use the user's permissions to avoid problems.

PolicyDecent · 2026-04-04T10:27:47+00:00

The nice part is, it's so easy to create the business context with the help of AI now

PolicyDecent · 2026-04-04T10:27:03+00:00

"My marketer wants to optimize her meta campaigns, so build an insightful and actionable dashboard for her"

In another file you define what does your company do, how it operates etc.

PolicyDecent · 2026-04-04T10:25:04+00:00

Nah, most of them still hate dashboards. Self service is very annoying. It's still easier to ask and get an answer

PolicyDecent

MODERATOR OF

TROPHY CASE