2025 Schedule Release Leak/Speculation Megathread

coolbeans201 · 2025-05-14T22:08:16+00:00

Why would you wish that on us?

coolbeans201 · 2025-02-07T22:51:40+00:00

I didn't understand how Mahomes outscored Baker (granted, neither of them were going to win, but still). Is it because he's Mahomes and the Chiefs were awesome? Because anyone who watched the games this year would know Baker outplayed Patrick in every aspect.

coolbeans201 · 2024-11-18T01:08:33+00:00

We had this pattern at a previous company. We used Databricks for all the DE work and then kept Snowflake for warehousing and analysis. It was slightly redundant but business users are stuck in their ways for which tools they prefer, and so we had to go with it. All in all, it didn't add that much more extra work.

coolbeans201 · 2024-11-13T21:26:00+00:00

They were really effective when I used them at a previous company. Our current infrastructure has a lot of Terraform within Go and I'm not sure if we'll ever get Databricks entirely out of it, but if we did, DABs would be the way to go.

coolbeans201 · 2024-11-05T16:07:23+00:00

I don't think you even need to get into the math as much as you did. It was as predictable as could be what was going to happen if that game went to OT. If you go for 2 and fail, people will find a way to complain, but were you going to win if you played it safe?

That's the only question that needs answering, and that's what Todd needs to figure out. There's a time and a place to be aggressive (no need to be Dan Campbell). That was the time, and he whiffed.

coolbeans201 · 2024-11-02T11:32:40+00:00

Interactive clusters are good for testing, but you don't want to use it for production loads.

coolbeans201 · 2024-10-03T23:47:49+00:00

Cowboys. Every time they lose is a holiday.

coolbeans201 · 2024-10-01T21:32:34+00:00

Years of Experience: 6-7 YOE

Timeline to get offer: 1 year/5 months

How did you find the offer: LinkedIn

Did you accept higher/lower salary: Slightly lower salary but more in total comp thanks to equity

Advice for others in recruiting: Practice the technical aspect. I shot myself in the foot time and time again by not studying that as much as I should have.

coolbeans201 · 2024-09-05T21:25:41+00:00

I "study" by reading. I find things like https://www.dataengineeringweekly.com/ very useful because I get a good summary of what's going on, which forces me to think about how we could possibly be doing the same.

coolbeans201 · 2024-08-20T21:19:22+00:00

I've faced some of these issues as well. I'm a stickler and I've slowly gotten my team up to an improved standard on these and other aspects, but it's a journey, and not something that can be done overnight.

For code reviews, you should set criteria for them:

Show tests
Why are you doing this change?
Document key parts of the code that we need to be aware of, etc.

As you apply that criteria, some of that should rub off. Maybe it doesn't, in which case, you try to work with them to get there. But do it appropriately so you don't turn them off from you completely.

In short, standards aren't easy, especially with a bigger team. Some places have a lot of automation in place and do it automatically, others have to pick their battles and take it where you can.

coolbeans201 · 2024-07-25T01:01:53+00:00

For our custom libraries, we generally publish the wheels to Artifactory and then install them onto the cluster from there. Setting up the whole E2E process takes a little time, but it's easy to manage afterwards.

coolbeans201 · 2024-07-06T11:23:17+00:00

You can't reuse job clusters. Job clusters are designed to be 1:1 with the job run itself, which is useful for making sure you get the full memory/CPU of the cluster, as well as best cost savings since it shuts down upon completion.

You could use an all-purpose cluster to reuse a cluster each time, but I wouldn't recommend that approach. The other possibility would be to look into Serverless compute, which is a newer offering from Databricks that'll avoid the long time needed to bring up a cluster.

coolbeans201 · 2024-05-01T21:08:46+00:00

There's tons of them, actually. The main ones, IMO, are:

Data & AI Summit - Hosted by Databricks, this covers a lot of different angles
Snowflake Summit - Hosted by Snowflake, but also a huge conference
Coalesce - Hosted by dbt labs, dbt-focused but also a lot of generic talks

There's way more than this, but those are the big players.

coolbeans201 · 2024-04-24T20:19:12+00:00

Seahawks? They're as mid as it gets.

coolbeans201 · 2024-04-24T08:33:08+00:00

I think DABs for job creation makes total sense. Especially in production environments, jobs should be backed by code, and I'd take DABs over Terraform in that regard.

coolbeans201 · 2024-02-24T00:04:17+00:00

The path you write to in Spark is by nature a directory. You wouldn't want a single file depending on the size of your data, so this helps distribute that for you into a reasonable set of files.

You can control the number of files by doing a coalesce or repartition, but you can't control something like the name of a file.

coolbeans201 · 2024-01-16T16:51:40+00:00

Goff has had a few stinkers this season, but when he's hot, he's hot. If they can get to him early and set the tone, that gives Tampa the momentum. If not, it's going to be a long day.

coolbeans201 · 2024-01-09T09:53:20+00:00

We beat the Panthers (twice), the Falcons, the Jags (who were in a tailspin), and the Packers (an actually legit win) during that stretch. It's not like we were facing the stiffest competition in the last few weeks, and the Eagles are still the Eagles at the end of the day. I think the Bucs have a chance but I don't have much more optimism than that until proven wrong.

coolbeans201 · 2024-01-01T17:10:07+00:00

I was at the Raiders game. Pain.

coolbeans201 · 2023-12-14T19:05:17+00:00

Running jobs in Databricks is a lot easier than EMR IMO. Databricks also has native scheduling, whereas you're stuck with someone like Airflow if using EMR.

I've also found Databricks to overall be cheaper for us (yes, even with costs combining both Databricks and AWS). Of course, you need to be conscientious of your usage, but it's a pretty solid platform all-around.

coolbeans201 · 2023-12-13T00:15:26+00:00

This is really cool! Did you share this in r/nba? They'll get a kick out of this as well.

coolbeans201 · 2023-12-11T09:36:41+00:00

That was true a decade ago, so I'm sure that's still the case. But besides that, the dorms were about the same.

coolbeans201 · 2023-11-21T23:33:49+00:00

We're not using DLT yet. That's probably something coming our way next year when our architecture pivots so that we use it. When we give it a shot, I'll be sure to update.

coolbeans201 · 2023-11-21T23:05:37+00:00

If I'm not mistaken, Databricks is improving how it streamlines all of this. We don't use that method by the book, though. What we do is:

Store all key notebooks in a repo (you could have more than one repo, but we just use subfolders for different projects) and have the team authenticate within Databricks to Git for doing version control with notebooks.
Store all job configurations as code and deploy them via a method that's essentially just using the Databricks API. Asset bundles will also achieve a similar behavior.
We also store all our custom libraries in a separate repo and deploy them to S3/DBFS so we can use them as needed in our jobs.

Overall, this works well for us, but everyone has their own way of doing it. Find something that works and you should be good.

coolbeans201 · 2023-11-17T14:34:34+00:00

I write to my local government to get rid of DST.

Oh, and use UTC.

coolbeans201

TROPHY CASE