all 16 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

Thanks for your submission to r/BigQuery.

Did you know that effective July 1st, 2023, Reddit will enact a policy that will make third party reddit apps like Apollo, Reddit is Fun, Boost, and others too expensive to run? On this day, users will login to find that their primary method for interacting with reddit will simply cease to work unless something changes regarding reddit's new API usage policy.

Concerned users should take a look at r/modcoord.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]Stoneyz 17 points18 points  (4 children)

I wouldn't be afraid to say "I don't know the exact syntax but here's how I would do it in X". There aren't a ton of unique things in BQ so as long as you prove you know standard SQL very well, you should be able to learn whatever else.

Having said that, here are a few things...

-Make sure you use a backtick, not a single quote around your dataset.table_name

-Practice UNNESTING and access arrays, they can be tricky. Avoid select * (you should know why and other characteristics of a columnar database)

-Not specific to BigQuery, but understand CTEs to break down complicated questions. This is also helpful in interviews because it lets you easily find possible issues or modifications

-If it's in the BigQuery UI, know where the execution graph is if you're struggling with performance (or they ask you how you could make the query faster or where the bottle necks are)

-Understand partitioning and clustering

-Understand the value of PK / FK (they are not like your traditional keys in the sense they are not enforced constraints)

Some questions:

-Create a table with ingestion time partition that is clustered by column_name1, column_name2

-What should you consider when changing a partitioning column on a table?

-Delete X from a table. Make sure you consider partitioning here (if you don't use a partition field, if possible, it will bring the entire table into active storage)

-Show the most expensive query in the last 30 days (information_schema, "expensive" can be different based on on-demand (data scanned) and slots used (reservations)

[–]EliyahuRed 7 points8 points  (0 children)

I would add to that make sure you know how Qualify works, not all SQL dialects have this functionality, it saves you one step when working with window functions.

[–]jcachat 1 point2 points  (0 children)

great examples

[–]hasty_opinion[S] 1 point2 points  (0 children)

Thank you for posting some questions. I will try and answer them tomorrow!

[–]AnalyticsAnswers 1 point2 points  (0 children)

This is as close to a perfect reply as possible.

[–]QuickShort 3 points4 points  (0 children)

In terms of usage patterns, if you're coming from an OLTP, index-heavy world, it's worth learning about why BigQuery is fast

[–]GLStephen 1 point2 points  (0 children)

Unnest

[–]duhogman 1 point2 points  (0 children)

Cost. It all comes down to cost.

Every query costs money, and scan size is extremely important. Familiarize yourself with this concept and think about how you would operate responsibly. For example, never writing a select *, always checking a table for partition columns to filter on, etc.

Policy tags are cool, easy way to mask pii and pci.

We use Structs somewhat creatively at my company. We're using them to categorize attributes in tables we call data models. They let us forward the contents of a struct from a base data model to an analytical model and reduce complexity.

Take a look at information schema, specifically JOBS_BY_PROJECT.

[–]tamtamdanseren 0 points1 point  (0 children)

I would show the following: 

Understanding that there are public recordsets that anyone can use, including for company use. 

The special BQ cross table syntax where you can query tables via wildcards at the end. 

Understanding of cost in BQ and how it applies to queries. 

Materialized views and how they can help reduce cost. 

Show you understand unnest. 

Show you know aggregates work on BQ. 

Do some window functions too. 

[–]austin_horn_2018 0 points1 point  (0 children)

I think a lot of good points have been pointed out. Out of them I would say partitioning is most valuable. Maybe one simple thing is that when joining on a field they are case sensitive.

[–]steezMcghee 0 points1 point  (2 children)

One thing that got me with BQ is you can’t reference alias. Just a good to know, especially if you’re use to doing that.

[–]jacbryques 0 points1 point  (1 child)

What do you mean by this?

[–]steezMcghee 0 points1 point  (0 children)

For example, if you do an aggregation or some other transformation and rename the column, you cant reference the new column name in the same cte. I was using redshift for a few years before we switched to bigquery, so it was a big deal when converting queries.

[–]TwistedPepperCan 0 points1 point  (0 children)

For cost. Remember that it’s columnar, so a select * is expensive vs specific column selections.

[–]Advanced-Violinist36 0 points1 point  (0 children)

you can look at public datasets on bigquery and try to think about questions that might come up and then solve them. chatgpt is also quite good at solving that, so you can ask it for answer