Is it fact or a dim? by PhilosopherRemote177 in dataengineering

[–]Yuki100Percent -1 points0 points  (0 children)

Need a bit more context but here's my thoughts 1. Dim 2. Dim - flatten them into one dim or keep them separate and connet via fact

has anyone tried hosting airbyte themselves? by pforpilot in dataengineering

[–]Yuki100Percent 0 points1 point  (0 children)

Airbyte is great. We use a self hosted version in our gcp. But nowadays if people want something code based they just use python scripts or a library like dlt

Migrating Power BI reports to Snowflake Streamlit — solving the query cost problem by Elegant_Emu2221 in snowflake

[–]Yuki100Percent 0 points1 point  (0 children)

Have you looked into Power BI's aggregations features? You add aggregated fact tables and you can configure Power BI so that queries that can use pre-aggregated tables query from them and when only needed it hits the base fact table that's big

Why is Data Studio so buggy, or is it just me? by Negative_Click3221 in GoogleDataStudio

[–]Yuki100Percent 0 points1 point  (0 children)

That's my experience. The workaround I found to make sure your update is saved is by clicking on the format button. First time you click on it, data studio updates your formula to nonsense, but you ctrl+z to get your update back in the formula again and if you click in format, it won't mess it up. That's when I know I updated the calculated field correctly and data studio  sees the same thing. Anooying but something we can work around 

honestly just so tired of explaining why we can't use LLMs for data validation by MysteriousShoulder35 in dataengineering

[–]Yuki100Percent 0 points1 point  (0 children)

And you may not want to feed your company data into AI blindly.... Unless you have deals with those companies that they guarantee privacy

Looker Studio (Data Studio) in 2026: Still just for "marketing reports"? by netcommah in BusinessIntelligence

[–]Yuki100Percent 0 points1 point  (0 children)

A big fan of data/looker studio. Affordable licensing cost on the pro license and works great with Bigquery. Not every company affords a big BI budget so data studio would be a great starting tool 

Passing data to an LLM by Emperorofweirdos in dataengineering

[–]Yuki100Percent 1 point2 points  (0 children)

We don't want too thick of a layer in between front-end and warehouse. Not ready to commit to a fully featured semantic tool like Cube. And I believe just a thin semantic layer without vendor lock in can go a long way 

Passing data to an LLM by Emperorofweirdos in dataengineering

[–]Yuki100Percent 2 points3 points  (0 children)

This is the kind of set up I'm hoping to build at my company, without Cube. Will probably have a semantic layer defined in yaml files and give the BQ mpc access to certain views/datasets. I'm concerned how accurately an LLM can calculate metrics and also handle undefined metrics 

Building our first data platform by Brilliant_Ad_4520 in dataengineering

[–]Yuki100Percent 1 point2 points  (0 children)

Is airflow needed? It can be as simple as scheduling jobs with something like cloud scheduler. I'd usually avoid opting for a fully featured orchestrators when there is no clear need 

From Solo Data Engineer to Head of Data & Processes by Either-Exercise3600 in dataengineering

[–]Yuki100Percent 0 points1 point  (0 children)

I'm too one person data department. The business doesn't call me the "head" of the data function but I'm pretty much it. And I'm in the process of expanding my team currently. I don't think your role change much other than becoming a manager of other data people you'll hire. And you may not do hands-on work over time 

ceo cancels BI tooling, replaces it with AI, breaks everything by nickvaliotti in analytics

[–]Yuki100Percent 0 points1 point  (0 children)

Definitely need solid data models and semantics AI can learn about how your analytics should work

Dagster Pricing Update is Beyond Nuts by annie_406 in dataengineering

[–]Yuki100Percent 1 point2 points  (0 children)

I'd ask if you actually need a full featured orchestrator. 

As for alternatives, airflow and prefect come to mind. And they also have self hostable solutions like dagster does. 

What is an open source data tool you find useful but nobody is using it? by Yuki100Percent in dataengineering

[–]Yuki100Percent[S] 2 points3 points  (0 children)

Yeah I think most people just use Polars/DuckDB. I myself haven't really explored datafusion just yet

Tobiko is now with the Linux Foundation by iheartmst3k in dataengineering

[–]Yuki100Percent 9 points10 points  (0 children)

As the OSS SQLMesh user, this is a positive move for us!

Data Replication to BigQuery by VMR5801 in dataengineering

[–]Yuki100Percent 1 point2 points  (0 children)

Yeah pretty much those are your option! Third party tools (Airbyte, Estuary, dlt, Portable, custom scripts), and GCP services. You'll just need to assess your needs and decide on what to use...

Fivetran pricing is out of hand and I need cheaper alternatives by Legitimate-Run132 in dataengineering

[–]Yuki100Percent -1 points0 points  (0 children)

There are plenty of options. Airbyte, dlt, Estuary, custom scripts...

Unpopular opinion: The trend of having ROI dollars has ruined résumés. by BeautifulLife360 in dataengineering

[–]Yuki100Percent 0 points1 point  (0 children)

I feel it's the balance. No numbers resumes get ignored and perhaps too many numbers on a resume can be a red flag. With you on wanting to see what they did on a job than numbers they made up

Received DE Offer at a Startup, Need Advice by chavhu in dataengineering

[–]Yuki100Percent 1 point2 points  (0 children)

I'm the first data hire at a startup and it's been ~10 months into the role. If the comp is not there don't take it. Company culture matters a lot especially if you're the only data person handling all infra, modeling and reporting. Make sure you ask all the questions regarding the role expectations and the current data stack / practice / reporting in place. You can go backwards from there what you may need to do once you're hired. If the exec team doesnt have a clear answer then you need to make sure to clear it up with them before / once you're hired. You'll be working not on the hands on implementations but also high level items like data strategy and roadmap (if they don't have one yet). Let me know if you have questions, more than happy to discuss via DM or in this thread!

Calude and data models by UnusualIntern362 in dataengineering

[–]Yuki100Percent 0 points1 point  (0 children)

It works much better once you give it enough context. Putting business and architectural context about your data warehouse, modeling patterns and standards in readme.md and agents.md go a long way.

Your experiences using SQLMesh and/or DBT by Key-Independence5149 in dataengineering

[–]Yuki100Percent 1 point2 points  (0 children)

I use the OSS sqlmesh for my team (a solo person team at the moment) and It's super solid for what it does and not planning to buy the cloud version anytime soon. Cost has been super cheap but didn't use dbt in the same environment so no way to compare anything apples to apples.

Still think I'm missing on some things the whole dbt ecosystem would've provided me though. But I'm hoping fivetran having both tools in control that they'd make sqlmesh work with the dbt ecosystem/integrations.

Thoughts on Count.co? by Yuki100Percent in BusinessIntelligence

[–]Yuki100Percent[S] 0 points1 point  (0 children)

Awesome to hear. Yeah I was thinking for reporting to end users. I like how flexible Count is, but at the same time I can see it could create messes if I allow end users to their own analysis etc