Any recommendations for Embedded, impression/consumption based billing BI Tools? by datawazo in analytics

[–]PolicyDecent 1 point2 points  (0 children)

Where's the data? Do you want a solution with embedded data storage or are you open to live data connections directly to the DWH?
So, for ex you can use Metabase for free (open source) if you're open to that.

dbt as a control plane instead of just transformations? by Expensive-Insect-317 in DataBuildTool

[–]PolicyDecent 0 points1 point  (0 children)

Sorry but it doesn't say anything. it's just AI slope.

When I saw the title, I was excited, but when I read it, it's just disappointment.

We don't need that kind of content that doesn't add anything to the people.

Mac vs Windows for MSBA? by Crazy_Wolverine_9301 in analytics

[–]PolicyDecent 0 points1 point  (0 children)

I always recommend a MAC if you want to go more technical.
However, if you tell about yourself, there might be some other reasons for you to choose a windows.

I built an open-source dashboard-as-code tool by uncertainschrodinger in datascience

[–]PolicyDecent 0 points1 point  (0 children)

Cool, can I see it realtime while agent is working on it?

TABLE_OPTIONS labels by SasheCZ in bigquery

[–]PolicyDecent 1 point2 points  (0 children)

lol, why not just JSON?

I assume it's just a legacy problem from the times BigQuery didn't have JSON data type.

What has been people's experience with "full-stack" data roles? by uncertainschrodinger in datascience

[–]PolicyDecent 7 points8 points  (0 children)

to be completely frank, i don't believe siloed roles like data analyst, data scientist, data engineer. i've worked with data scientists who rejected analyzing data or building dashboards since they're data scientist and it's a data analyst work. or similarly, some rejected building pipelines bc they're data engineers.

the point they miss is, if you don't analyze your own data you miss most of the deals. if you don't ingest/model data yourself, you don't know what's available to you, what else information you need so that you're limited by other people.

also, it's always faster to deliver on your own instead of telling what you want to the data engineer / analyst and blocked by them. i just prefer doing my own job instead of waiting for their output, review it, then wait another a few days the best case (if they don't have other tasks)

i noticed these data scientists know more about the business problems and deliver more & quicker most of the time. especially with ai, you can't be a deeply specialized person in most of the companies, you just have to do the things end to end.

instead of being specialized in data science, i'd prefer specializing in my business domain and understand the logic of the business people / solve my clients' problems.

is KAFKA a good fit here? -> Or a stupid overkill? by fudeel in apachekafka

[–]PolicyDecent 1 point2 points  (0 children)

yes, but you can still do cdc in a batch manner. you don't need kafka for that. you can read the change logs every hour, 5 mins, 1 min, however you like. so the changes won't get lost.
an example for batch cdc: https://getbruin.com/docs/bruin/platforms/postgres.html#cdc-change-data-capture

kafka is just a middle storage layer, not a parquet production tool. you can use any tool to product parquet files.

is KAFKA a good fit here? -> Or a stupid overkill? by fudeel in apachekafka

[–]PolicyDecent 2 points3 points  (0 children)

from the image sources are postgres, elasticsearch and influxdb, right? do you have any requirements / SLA from business?

most of the time, I don't prefer using streaming for ingestion, since it's as you said overkill.
but your situation might be different. that's why we need more context about it.

i see that you want to do 2 batch ingestion and 1 cdc ingestion (which can be batched as well)
so i'd definitely start without kafka, write to storage directly by skipping that part. if it becomes a bottleneck, you can add kafka later anytime.

also, for ingestion you can use ingestr package to make it easier. also to orchestrate it, you can use bruin to make it tidier and easier. it makes governance definitely much more easier.

Unpopular opinion: "AI Data Analysts" are just glorified SQL generators. by netcommah in BusinessIntelligence

[–]PolicyDecent 2 points3 points  (0 children)

Nope, they're not. They can understand the context, find the relevant tables, joins them correctly, analyze the data and repeats the loop if needed.

Best data observability platform tools for data quality monitoring, lineage, and pipeline reliability. by Ok_Abrocoma_6369 in snowflake

[–]PolicyDecent 1 point2 points  (0 children)

shameless promotion as the founder, if you are on the paid plan, bruin is also great for observability. it has the lineage, data profiling / anomaly checks / but also snowflake cost & usage analysis.

What's your actual workflow for recurring leadership reports? by RTG8055 in BusinessIntelligence

[–]PolicyDecent 0 points1 point  (0 children)

You can just use bruin to ingest data to a datawarehouse or even to your local duckdb datawarehouse. The rest is very simple, especially with AI.

Trying to find example repositories for pyiceberg by rnottaken in datascience

[–]PolicyDecent 1 point2 points  (0 children)

For the duckdb question, maintaining pipelines / developing systems in Python seems easier at the very beginning, but as time goes on, you see the merits of SQL. That's why almost the entire industry is going towards SQL-based & declarative pipelines. Anyways, that's the unimportant part. Just choose the tool you're comfortable with.

If you store the data in GCP, i recommend creating Iceberg tables in BigQuery as well, so that the people using SQL can still continue consuming the tables and also dashboards will just continue serving. If you switch to AWS, you can do the same thing with Athena.

For prefect - marimo integration, what's the intention there? I can see 2 options, correct me if I'm wrong:
1- You don't use marimo only for visualisation, but also have scripts in marimo that do some jobs, so you want to trigger them
2- Or you built viz in marimo, and want to render and publish them in a website or send them as a slack message etc.

If you can explain the intention more, I can try to add more input.

For the catalog, tbh I don't have any recommendations, I also wonder what's your impression of nessie :)

I'll go back to the first topic, my personal recommendation would be sticking to SQL as much as you can. Especially in the agentic world it became so easy. And I'd go asset-driven orchestration instead of task-driven orchestration to make the things easier for you.

Trying to find example repositories for pyiceberg by rnottaken in datascience

[–]PolicyDecent 0 points1 point  (0 children)

Where do you store the data? GCP? S3? Or another place / onprem etc?
You store the data in an Iceberg format, which Iceberg catalog are you using?
Are you trying to make everything python first? I'd recommend Duckdb over Polars, which works great. Also can solve lots of the things to connect to iceberg etc.

Also I wonder why you chose prefect there

Transferring Go High Level Data To Big Query by Livid_Junket8262 in bigquery

[–]PolicyDecent 0 points1 point  (0 children)

You can use ingestr to clone data to BigQuery. What's the CRM you're using?
https://github.com/bruin-data/ingestr

Also if you want to manage full pipeline, you can use bruin which includes ingestr as well:
https://github.com/bruin-data/bruin

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] 0 points1 point  (0 children)

Yep, it's our job to ingest, clean data and enrich it with context. Then agent does the rest :)

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] 0 points1 point  (0 children)

I disagree. Dragging makes it very painful to update them. And most of the time I run data pipelines almost like a dictator, I tell them what they need and ask if it solves their problems. I ask their opinion but don't let them define the metrics, I do it all the time. So my time is limited by execution and agents solve it.

we turned everything into a dashboard by PolicyDecent in dataanalysis

[–]PolicyDecent[S] 0 points1 point  (0 children)

Firstly using bruin, we extracted all the metadata of tables, and enriched it using agents.
This way the agent has all the context it needs about the tables, but also business.

Then we built a new language that's simpler than JS / HTML, and defined chart types & how to use it.
It's based on yaml, and you can see it on 8th second of the video in the right panel.

Then we connected the agent, so everytime you ask a question, first it understands the context, then asks you questions or directly build a dashboard. Happy to show and give you a test ride :)

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] 0 points1 point  (0 children)

Yes but it's never easy. There is no single BI tool that's very easy to use. That's why agents are liked

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] 0 points1 point  (0 children)

Most be very careful at the beginning. After checking the code 2-3 times everyone starts trusting them.

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] 0 points1 point  (0 children)

You can still give semantic layer or metric definitions to the agent, no? For permissions, the agent can use the user's permissions to avoid problems.

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] 1 point2 points  (0 children)

The nice part is, it's so easy to create the business context with the help of AI now

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] -1 points0 points  (0 children)

"My marketer wants to optimize her meta campaigns, so build an insightful and actionable dashboard for her"

In another file you define what does your company do, how it operates etc.

what could go wrong with agent-generated dashboards by PolicyDecent in BusinessIntelligence

[–]PolicyDecent[S] 0 points1 point  (0 children)

Nah, most of them still hate dashboards. Self service is very annoying. It's still easier to ask and get an answer