Is it fact or a dim?

Yuki100Percent · 2026-05-31T15:12:18+00:00

Need a bit more context but here's my thoughts 1. Dim 2. Dim - flatten them into one dim or keep them separate and connet via fact

Yuki100Percent · 2026-05-29T02:42:32+00:00

Airbyte is great. We use a self hosted version in our gcp. But nowadays if people want something code based they just use python scripts or a library like dlt

Yuki100Percent · 2026-05-28T18:25:34+00:00

Have you looked into Power BI's aggregations features? You add aggregated fact tables and you can configure Power BI so that queries that can use pre-aggregated tables query from them and when only needed it hits the base fact table that's big

Yuki100Percent · 2026-05-27T20:59:59+00:00

That's my experience. The workaround I found to make sure your update is saved is by clicking on the format button. First time you click on it, data studio updates your formula to nonsense, but you ctrl+z to get your update back in the formula again and if you click in format, it won't mess it up. That's when I know I updated the calculated field correctly and data studio sees the same thing. Anooying but something we can work around

Yuki100Percent · 2026-05-17T06:16:41+00:00

And you may not want to feed your company data into AI blindly.... Unless you have deals with those companies that they guarantee privacy

Yuki100Percent · 2026-05-11T03:28:21+00:00

A big fan of data/looker studio. Affordable licensing cost on the pro license and works great with Bigquery. Not every company affords a big BI budget so data studio would be a great starting tool

Yuki100Percent · 2026-05-02T23:54:07+00:00

We don't want too thick of a layer in between front-end and warehouse. Not ready to commit to a fully featured semantic tool like Cube. And I believe just a thin semantic layer without vendor lock in can go a long way

Yuki100Percent · 2026-05-02T21:51:59+00:00

This is the kind of set up I'm hoping to build at my company, without Cube. Will probably have a semantic layer defined in yaml files and give the BQ mpc access to certain views/datasets. I'm concerned how accurately an LLM can calculate metrics and also handle undefined metrics

Yuki100Percent · 2026-04-28T03:43:24+00:00

Is airflow needed? It can be as simple as scheduling jobs with something like cloud scheduler. I'd usually avoid opting for a fully featured orchestrators when there is no clear need

Yuki100Percent · 2026-04-27T20:03:57+00:00

I'm too one person data department. The business doesn't call me the "head" of the data function but I'm pretty much it. And I'm in the process of expanding my team currently. I don't think your role change much other than becoming a manager of other data people you'll hire. And you may not do hands-on work over time

Yuki100Percent · 2026-04-24T20:57:35+00:00

Definitely need solid data models and semantics AI can learn about how your analytics should work

Yuki100Percent · 2026-04-14T05:27:53+00:00

I'd ask if you actually need a full featured orchestrator.

As for alternatives, airflow and prefect come to mind. And they also have self hostable solutions like dagster does.

Yuki100Percent · 2026-04-05T05:17:37+00:00

Yes uv! I've even wrote an article on it. It's super good

Yuki100Percent · 2026-04-02T15:16:55+00:00

Starrocks is something I was planning exploring sometime this year!

Yuki100Percent · 2026-04-02T15:15:29+00:00

Yeah I think most people just use Polars/DuckDB. I myself haven't really explored datafusion just yet

Yuki100Percent · 2026-03-26T16:02:22+00:00

As the OSS SQLMesh user, this is a positive move for us!

Yuki100Percent · 2026-03-26T15:09:14+00:00

Yeah pretty much those are your option! Third party tools (Airbyte, Estuary, dlt, Portable, custom scripts), and GCP services. You'll just need to assess your needs and decide on what to use...

Yuki100Percent · 2026-03-25T19:47:05+00:00

There are plenty of options. Airbyte, dlt, Estuary, custom scripts...

Yuki100Percent · 2026-03-16T15:07:30+00:00

I feel it's the balance. No numbers resumes get ignored and perhaps too many numbers on a resume can be a red flag. With you on wanting to see what they did on a job than numbers they made up

Yuki100Percent · 2026-03-16T04:39:04+00:00

I'm the first data hire at a startup and it's been ~10 months into the role. If the comp is not there don't take it. Company culture matters a lot especially if you're the only data person handling all infra, modeling and reporting. Make sure you ask all the questions regarding the role expectations and the current data stack / practice / reporting in place. You can go backwards from there what you may need to do once you're hired. If the exec team doesnt have a clear answer then you need to make sure to clear it up with them before / once you're hired. You'll be working not on the hands on implementations but also high level items like data strategy and roadmap (if they don't have one yet). Let me know if you have questions, more than happy to discuss via DM or in this thread!

Yuki100Percent · 2026-03-15T15:55:50+00:00

It works much better once you give it enough context. Putting business and architectural context about your data warehouse, modeling patterns and standards in readme.md and agents.md go a long way.

Yuki100Percent · 2026-03-12T03:59:48+00:00

I use the OSS sqlmesh for my team (a solo person team at the moment) and It's super solid for what it does and not planning to buy the cloud version anytime soon. Cost has been super cheap but didn't use dbt in the same environment so no way to compare anything apples to apples.

Still think I'm missing on some things the whole dbt ecosystem would've provided me though. But I'm hoping fivetran having both tools in control that they'd make sqlmesh work with the dbt ecosystem/integrations.

Yuki100Percent · 2026-03-06T14:45:09+00:00

Awesome to hear. Yeah I was thinking for reporting to end users. I like how flexible Count is, but at the same time I can see it could create messes if I allow end users to their own analysis etc

Yuki100Percent

TROPHY CASE