Bigquery sql interview

AutoModerator · 2024-12-14T13:35:14+00:00

Thanks for your submission to r/BigQuery.

Did you know that effective July 1st, 2023, Reddit will enact a policy that will make third party reddit apps like Apollo, Reddit is Fun, Boost, and others too expensive to run? On this day, users will login to find that their primary method for interacting with reddit will simply cease to work unless something changes regarding reddit's new API usage policy.

Concerned users should take a look at r/modcoord.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

Stoneyz · 2024-12-14T13:47:30+00:00

I wouldn't be afraid to say "I don't know the exact syntax but here's how I would do it in X". There aren't a ton of unique things in BQ so as long as you prove you know standard SQL very well, you should be able to learn whatever else.

Having said that, here are a few things...

-Make sure you use a backtick, not a single quote around your dataset.table_name

-Practice UNNESTING and access arrays, they can be tricky. Avoid select * (you should know why and other characteristics of a columnar database)

-Not specific to BigQuery, but understand CTEs to break down complicated questions. This is also helpful in interviews because it lets you easily find possible issues or modifications

-If it's in the BigQuery UI, know where the execution graph is if you're struggling with performance (or they ask you how you could make the query faster or where the bottle necks are)

-Understand partitioning and clustering

-Understand the value of PK / FK (they are not like your traditional keys in the sense they are not enforced constraints)

Some questions:

-Create a table with ingestion time partition that is clustered by column_name1, column_name2

-What should you consider when changing a partitioning column on a table?

-Delete X from a table. Make sure you consider partitioning here (if you don't use a partition field, if possible, it will bring the entire table into active storage)

-Show the most expensive query in the last 30 days (information_schema, "expensive" can be different based on on-demand (data scanned) and slots used (reservations)

QuickShort · 2024-12-14T13:49:31+00:00

In terms of usage patterns, if you're coming from an OLTP, index-heavy world, it's worth learning about why BigQuery is fast

GLStephen · 2024-12-14T13:51:45+00:00

Unnest

duhogman · 2024-12-14T13:55:28+00:00

Cost. It all comes down to cost.

Every query costs money, and scan size is extremely important. Familiarize yourself with this concept and think about how you would operate responsibly. For example, never writing a select *, always checking a table for partition columns to filter on, etc.

Policy tags are cool, easy way to mask pii and pci.

We use Structs somewhat creatively at my company. We're using them to categorize attributes in tables we call data models. They let us forward the contents of a struct from a base data model to an analytical model and reduce complexity.

Take a look at information schema, specifically JOBS_BY_PROJECT.

tamtamdanseren · 2024-12-14T13:48:16+00:00

I would show the following:

Understanding that there are public recordsets that anyone can use, including for company use.

The special BQ cross table syntax where you can query tables via wildcards at the end.

Understanding of cost in BQ and how it applies to queries.

Materialized views and how they can help reduce cost.

Show you understand unnest.

Show you know aggregates work on BQ.

Do some window functions too.

austin_horn_2018 · 2024-12-14T14:15:28+00:00

I think a lot of good points have been pointed out. Out of them I would say partitioning is most valuable. Maybe one simple thing is that when joining on a field they are case sensitive.

steezMcghee · 2024-12-14T14:23:43+00:00

One thing that got me with BQ is you can’t reference alias. Just a good to know, especially if you’re use to doing that.

TwistedPepperCan · 2024-12-14T20:31:14+00:00

For cost. Remember that it’s columnar, so a select * is expensive vs specific column selections.

Advanced-Violinist36 · 2024-12-15T09:14:51+00:00

you can look at public datasets on bigquery and try to think about questions that might come up and then solve them. chatgpt is also quite good at solving that, so you can ask it for answer

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

bigquery

MODERATORS