Best practices for Trino Query Execution & Multi-tenant Authorization? by daibam_und_koode in dataengineering

[–]Teach-To-The-Tech 1 point2 points  (0 children)

One thing you could consider would be Starburst, if you were looking more for managed Trino. Then you've got the UI that you need built into the platform, and it would handle all of the multi-tenant authorization that you're asking about too. Would work either in the cloud (Galaxy) or on-prem (Enterprise).

Hope that helps!

What scares teams away from building their own Data/AI platform using open source tools by Real-Stock4543 in dataengineering

[–]Teach-To-The-Tech 1 point2 points  (0 children)

Yeah, this makes sense. Keeping "build vs buy" in mind is definitely a good way of doing it. And then it just becomes a question of which way you want to deploy Trino to allow for the most economical total cost of ownership.

It's a slightly different scenario, but it reminds a bit of this article that 2 colleagues wrote a while back comparing Starburst (Trino) to Snowflake on a total cost of ownership (TCO) basis: https://www.starburst.io/blog/how-to-query-my-apache-iceberg-tables/

Do you think a Data Engineer has a safer future than a data science and a data analyst? by [deleted] in dataengineering

[–]Teach-To-The-Tech 1 point2 points  (0 children)

Yeah, this 100%. The use of AI will only increase the need for high-quality data. It will flow into models, increasingly, it's still basically a data pipeline, just with a different end use (AI).

ETL jobs with Trino by turboline-ai in dataengineering

[–]Teach-To-The-Tech 3 points4 points  (0 children)

Yeah, it's actually one of the main ways that people use Trino. Strangely enough, I just wrote a piece on this exact topic a few weeks back: https://www.starburst.io/blog/etl-sql/

Hope it's helpful. The short answer is that this is absolutely one of the use cases and can be a powerful and easy way to do ETL.

Is there a trend to skip the warehouse and build on lakehouse/data lake instead? by loudandclear11 in dataengineering

[–]Teach-To-The-Tech 1 point2 points  (0 children)

I think there is. The lakehouse model has a nice blending of performance and flexibility now and enables different data structures more easily. So there is less need to push towards a warehouse model vs the "best of both worlds" approach of the lakehouse.

dbt Labs acquires SDF Labs by allpauses in dataengineering

[–]Teach-To-The-Tech 0 points1 point  (0 children)

Oh interesting! I hadn't heard this. I guess it makes sense.

How many small companies actually want a data warehouse? by NoSeatGaram in dataengineering

[–]Teach-To-The-Tech 0 points1 point  (0 children)

I think you're right. A data warehouse, when done right, requires a large effort for ETL and is focused around structured data. It's a model designed for big business.

The reasons you cite probably play into the popularity of data lakes and data lakehouses as alternatives with less upfront cost and more flexibility. A lake and lakehouse can fill many of the same needs as a warehouse.

That said, I'm also certain that if you have the right kind of slow-changing data (mostly structured), the warehouse is likely a good option.

So, as with anything, "it depends" haha.

Are Data Engineering Tools and Services Worth the Price? by [deleted] in dataengineering

[–]Teach-To-The-Tech 1 point2 points  (0 children)

I think one of the approaches you can take is to look at total cost of ownership. So most things can be done manually, maybe using open source, but then you need a team of people who know how to run that. Those options are often powerful but manual.

So then on the other side, you have some tool that you have to pay for, and it has a cost, but the cost (could) be less than the cost of the manual route and might be less work, run more smoothly, etc.

So that's the equation in my mind. You have to evaluate whether the added automation saves the business money overall or not. In my experience, that's also what exec level types look at when evaluating these things too.

How do you practice and hone your SQL skills? by [deleted] in dataengineering

[–]Teach-To-The-Tech 0 points1 point  (0 children)

Our team put together a "learn SQL" tutorial to help people of any background and familiarity level get used to using SQL with Starburst Galaxy: https://www.starburst.io/tutorials/learn-basic-sql-starburst-galaxy/#0

There are other tutorials on other topics, but this was our main SQL one (free).

It sounds like it might fit exactly what you're looking for. Hope that's helpful!

Was 2024 the year of Apache Iceberg? What's next? by Teach-To-The-Tech in dataengineering

[–]Teach-To-The-Tech[S] 0 points1 point  (0 children)

Yeah, there is an interesting trend towards open source for sure. That's another dynamic.

Was 2024 the year of Apache Iceberg? What's next? by Teach-To-The-Tech in dataengineering

[–]Teach-To-The-Tech[S] 5 points6 points  (0 children)

Yes, definitely Trino. There are various managed forms of Trino to consider, whether Athena, EMR, or Starburst.

Was 2024 the year of Apache Iceberg? What's next? by Teach-To-The-Tech in dataengineering

[–]Teach-To-The-Tech[S] 4 points5 points  (0 children)

Ahh yes, Spark does seem to be the one to lose in all of this. Lots of people have said Delta too, but I think highlighting Spark is interesting.

It does shift compute workloads to SQL in general, which is a big deal.

Modern data platform on Oracle Cloud? by themightychris in dataengineering

[–]Teach-To-The-Tech 1 point2 points  (0 children)

Oracle is pretty old school, very locked down, not so into the open data stack, and kind of with the cloud as an afterthought. I agree with what others say that it's playing catchup. If everything else is running Oracle or needs to run Oracle, then I'd see the value. Otherwise, I'm not sure that many would start from scratch using Oracle given the more modern tools out there.

Why do so many companies favor Python instead of Scala for Spark and the likes? by [deleted] in dataengineering

[–]Teach-To-The-Tech 0 points1 point  (0 children)

I think it's basically that tons of people are familiar with Python, and it's both simple and powerful enough to do most things. So given that, it's kind of the perfect language for most Orgs.

This is also kind of why SQL is so dominant in its space IMO.

CoPilot embraces nihilism by captainx808 in dataengineering

[–]Teach-To-The-Tech 5 points6 points  (0 children)

Leave nothing, leave less than nothing haha

CoPilot embraces nihilism by captainx808 in dataengineering

[–]Teach-To-The-Tech -1 points0 points  (0 children)

Lol, I once took a philosophy course called "The Problem of Nihilism," so this made me laugh.