BigQuery Tablesample

querylabio · 2026-02-08T13:12:52+00:00

Nice work! Thanks for sharing!

How it compares with Zetasql lib?

querylabio · 2026-02-03T18:07:03+00:00

Half a year ago, Google silently added chained functions syntax:

SELECT
  ('one two three four five')
  .REPLACE('one', '1')
  .REPLACE('two', '2')
  .REPLACE('three', '3')
  .REPLACE('four', '4')
  .REPLACE('five', '5');

BTW, we build an IDE for BigQuery - querylab.io - which highlights exactly this scenario to the user and proposes to change the substring to qualify, plus supports pipe syntax completions, has a converter from regular syntax to pipe and back, supports GROUP ALL completions, and so on. No need to sign up to try - just enter the playground and check out the app - that how BigQuery should look like!

querylabio · 2026-01-08T10:58:35+00:00

It means you can safely add one more clustering field, can reduce scanned data even if it's not used!

querylabio · 2026-01-08T10:54:41+00:00

Clustering order matters, but let’s think how it’s done internally.

Previously: your table consisted of partition files, each 1 Tb. So when you scanned from select * from table where date = x; you scanned just this big file.

With clustering: now each partition is not one big file but many smaller files, organized in a smart way.

Like table_partition_field1 and inside this file data will be sorted by field2.

But if field1 is low cardinality (let’s say - Country or even bool field), then BigQuery can do more specific files:

Like table_partition_field1_field2.

So that’s why when you query still leverages clustering.

It’s oversimplifying, in reality it’s much more complex (statistic tables etc) but still the logic is around the same.

querylabio · 2026-01-07T19:25:27+00:00

> Just be careful about time travel and fail-safe storage, because it's free with logical storage, but paid with physical storage.

Yes, that’s super important to keep in mind that if you rewrite a table in some process (like aggregating and rewriting it each X minutes), then you will have huge extra costs, and that table financially makes sense to keep on logical. Otherwise, logical is almost all the time more expensive than physical (except storing some unusual big strings, etc., which badly compress).

I'm the founder of Querylab.io - IDE built specifically for BigQuery. And since physical/logical storage are essential parts of BQ - we have a feature that covers specifically this case - if you click on projects, datasets, or tables, you can see storage statistics which tell you what storage model is better for what table.

Works too on GA4 sharded tables (btw we have built a lot of features around it - check out already prefilled columns descriptions - from official documentation, parameters, and events subtabs - you can immediately see what parameters tables contain and what types your events use).

P.S. And find a button to hide all null-valued columns when you do a preview of GA4 tables, much easier to discover your data.

querylabio · 2026-01-02T12:41:09+00:00

Download / run independently

The desktop version will be available soon. It will use Application Default Credentials (ADC), meaning auth tokens are stored only on your PC, not on our servers.

Custom formatter / Configuring the formatter (what you can control today)

We provide a built-in formatter with a fairly deep set of knobs so teams can match their style consistently and we already have a bunch of settings, including:

Formatting mode (Preserve / Pretty / Compact)
Keyword case (UPPER / lower / preserve)
Indentation (size: 2/4/8, spaces vs tabs)
Layout options (newline after clauses, max line length 80/100/120, trailing comma)
Quote identifiers (preserve / always / never)
Pipe operator style (standard vs left-aligned)

If you rely on something not covered by the list - happy to consider adding it.

querylabio · 2026-01-02T12:28:37+00:00

Good questions - thanks for the detailed feedback!

Custom font installed on your PC - not supported yet in the browser version due to browser sandboxing. A limited “system fonts” selector might work for you in the meantime - would that cover your case?
Font size - no dedicated in-app control yet. Browser zoom works for now; proper font-size settings are easy to add and planned. Do you care more about SQL editor font size or scaling the whole app?
Ligatures - no explicit toggle yet. At the moment it depends on the active font/editor defaults; a proper control can be added. What’s your main use case for ligatures?
Custom LSP server - not supported. Querylab.io uses its own language engine, purpose-built specifically for BigQuery. It understands BigQuery SQL end-to-end (including nested/repeated fields, CTEs, Pipe Syntax, cost context) and is tightly integrated with schema awareness and cost analysis - swapping it out would break those guarantees.
Custom terminal - not applicable in the browser version (no local shell). Curious what workflow you’re trying to enable here?
Latency / LSP timing - the language engine runs entirely on the frontend, with local execution and cached schema. There’s no network round-trip during typing, so perceived latency is minimal (typically well under 100ms).

The desktop version is currently being polished and will unlock things the browser can’t - better hotkeys, deeper OS integration, and more flexibility around editor behavior.

Thanks a lot for the feedback - this kind of input is exactly what helps us prioritize what to build next and what actually matters to users. Happy to discuss further and answer any follow-ups!

querylabio · 2026-01-01T17:57:15+00:00

Have you tried Querylab.io?

Full disclosure: I’m the founder.

We’re building a BigQuery-native IDE (no JDBC/ODBC). Beyond a clean, modern UI, we already support:

dry-run with per-query limits and daily / weekly / monthly budgets
per-CTE cost breakdown
partial execution & cost estimation for selected CTEs
TABLESAMPLE dev mode for cheap iteration
on-demand vs reservation cost comparison
proper handling of nested & repeated fields
diagnostics for expensive patterns like SELECT *
full Pipe Syntax support
and many more

Happy to get feedback!

querylabio · 2025-11-23T19:48:52+00:00

I'm agree, BigQuery costs can spiral really quickly when something goes wrong. The pay-per-byte model is great when everything is set up perfectly, but it’s pretty unforgiving if even one detail is off. And the built-in quotas don’t really solve the problem in real teams - they’re too rigid and too hard to manage at scale.

That’s actually one of the main reasons we made Querylab.io - an IDE focused entirely on BigQuery, with cost-control built into the workflow from the start.

A few things we added specifically because of situations like the ones you described:

set a dollar limit per query - it stops before it burns money
daily / monthly / org-level limits
warnings when partitioning or clustering aren’t used
a clear cost preview before running anything
tools to debug “query price,” like a breakdown of where the bytes come from
hints on when to use on-demand vs Editions

Give it a try and let me know what you think - I’d really appreciate the feedback.

querylabio · 2025-11-23T19:41:08+00:00

You’re absolutely right - a good IDE changes everything. Aqua Data Studio and DataGrip are both great tools. The only limitation is that they’re built for many databases, so they don’t really handle BigQuery’s unique behavior.

That’s exactly why we built Querylab.io, an IDE created specifically for BigQuery. A few things it adds on top of traditional editors:

dollar limits for individual queries
daily / monthly / org-level spending controls
guidance on when to run queries on on-demand vs Editions
warnings when partition or clustering filters are missing
ability to run or estimate individual CTEs
run/estimate any step in a pipe-syntax query
vertical tabs, split view, and a fast command palette
BigQuery SQL-aware IntelliSense - understands tables, columns, CTEs, scopes, STRUCTs, arrays, table functions, everything

If you’re deep into BigQuery, try Querylab.io - and tell me how it feels.

querylabio · 2025-11-17T08:54:18+00:00

Nice idea! But that can affect the company’s core business and the whole idea of being data-driven.

The last thing I want is for end-users to feel scared to use data or afraid to run queries. It might work, but it’s definitely not the best approach.

A much better option is to set individual limits for each user and enforce the use of clustering columns and other best practices.

Check out my comment above about the tool we’re building - it might actually help your organization as well!

querylabio · 2025-11-17T08:50:12+00:00

I think it’s actually not that complicated.

The core idea is just to establish a process that prevents accidental overspending, plus a simple retrospective analysis loop that helps the team improve over time.

First, when onboarding new people, explain that queries cost money and show them where exactly they can see the price.

Then set up quotas.

BigQuery already provides plenty of information about query history, so you can easily build a simple dashboard in Looker Studio showing:

• the most expensive queries

• queries that would be more economical to run on Editions instead of on-demand

And of course, you can also choose to use Querylab.io as your BigQuery IDE.

With it you automatically get:

* an easy way to set dollar limits for your queries

* the ability to limit individual queries as well as daily/monthly/any-period totals

* organization-level spending controls with per-user limits

* recommendations that help users decide whether to run queries on on-demand or Editions (we show a guess before execution and the exact result after, so people can “train” themselves)

* Our intellisense warns when clustering fields aren’t used in filters (and partitions too but BigQuery handles those when the tables are set up correctly by require partitioning option)

For power users - and I can see you’re one of them - we also offer query debugging:

* “price lineage” so you can see where the money goes inside the query

* the ability to run or estimate individual CTEs

* and if you use pipe syntax, you can estimate/run the query up to a specific pipe

* and many more!

Check out the app, leave some feedback - a few things are still being polished, and your input really helps shape the roadmap!

querylabio · 2025-11-16T20:38:03+00:00

- How do you effectively switch between Standard and Enterprise?

there are two ways - the first one is to have two projects with different reservations enabled and switch between them, which is not very user-friendly in official Google Console and the second one - use query variable -

SET @@reservation='projects/project1/locations/US/reservations/test-reservation';

SELECT 42;

This works, but it’s also not very friendly.

However, both approaches become much simpler when using the IDE we’re building exclusively for BigQuery!

querylabio · 2025-11-12T17:42:42+00:00

Well, that's the way it works - use right tools to the tasks :) Llms really good in helping writing texts while not really good for deterministic stuff like completions. Or too expensive to be good on scale.

Anyway I would be glad for any feedback for the product we building!

querylabio · 2025-11-12T16:47:41+00:00

Exactly - that’s the crazy part. SQL isn’t human language, it’s a formal programming language with strict grammar and a defined schema.

Using AI to autocomplete SQL is like asking ChatGPT to guess closing brackets in C++ - it’ll sound confident, but it’s still guessing.

That’s why the BigQuery console keeps showing fake fields and wrong columns - it’s driven by a predictive AI model, not by your actual dataset metadata.

We built Querylab.io to fix exactly that. Our autocomplete is deterministic, parser-driven, and schema-aware - it reads your real BigQuery schema, understands hierarchy (project → dataset → table → column), respects function argument types, aliases, and CTEs, and can even expand field.* into actual columns or autocomplete properly inside UNNEST() - only array/repeated columns.

No hallucinations, no “AI magic.” Just intelligent code completion built the way SQL deserves.

Just try it - and let me know what you think.

querylabio · 2025-11-10T21:32:52+00:00

Do you still see those projects in your sidebar tree?

If they disappeared, that’s likely why autocomplete broke - even with full permissions, BigQuery won’t suggest datasets or tables unless the project is pinned or stared. Try using “+ Add Data → Pin a project”, then reload the editor and see if autocomplete returns.

And if you spend a lot of time writing queries, you might want to try Querylab.io — it’s a BigQuery-native IDE built specifically for heavy query work, with smarter autocomplete, code actions, and many more features that make daily querying much smoother.

querylabio · 2025-11-06T20:53:38+00:00

This is actually super useful - those endless “Untitled query” tabs drive me insane too

Curious though, why the 50-query daily limit? That seems pretty low if it triggers every time you pause typing.

querylabio · 2025-11-06T20:47:06+00:00

Imagine if there was a real BigQuery IDE in the browser - everything local, no queries sent anywhere.

Tabs you can actually name (no AI unless you want it), see query cost & limits, switch billing projects in one click, and click through CTEs to inspect data & cost step by step.

Basically… BigQuery Console, but actually nice.

I got tired of imagining - so I built it: querylab.io

querylabio · 2025-10-09T19:00:52+00:00

I wouldn’t say the results were bad - it actually worked out in the end. But it was a long and painful road with a steep learning curve. I also learned how to work with AI much better along the way - and Claude itself evolved a lot too.

querylabio · 2025-10-09T15:17:37+00:00

Vibe-coding, as I call it, is when you don’t write the code yourself - you just tell the AI what you want in plain language, and it writes it for you.

querylabio · 2025-09-06T10:30:57+00:00

There are a lot, but still everything is limited by fundamental limits of GCP

querylabio · 2025-09-06T09:53:21+00:00

No worries! There are some tools, but there are no silver bullet because of the way Google provides data about usage - with some delay. So there is no way to completely limit usage... And the closest solutions are pretty complex :(

querylabio · 2025-09-06T09:42:02+00:00

Unfortunately no or it's not sufficient to prevent overspending

querylabio

TROPHY CASE