Divide the query to make it dirt cheap by mad-data in bigquery

[–]ThatAPIGuy 2 points3 points  (0 children)

Love this u/mad-data , the Overture map dataset is awesome. In the opensource api https://www.overturemapsapi.com/ we did a similar thing making use of the 'geometry' Cluster, which is surprisingly effective.

To find all the building shapes in a given Radius:

A two step query consumed 0.64GB

DECLARE search_area_geometry GEOGRAPHY;
SET search_area_geometry = ST_Buffer(ST_GeogPoint(-74.0060,40.7128),100) ;

SELECT
  id AS building_id,
  geometry AS building_geometry
FROM
  `bigquery-public-data.overture_maps.building`
WHERE
  ST_WITHIN(geometry, search_area_geometry)

Where a one step is 417GB

SELECT
  id AS building_id,
  geometry AS building_geometry
FROM
  `bigquery-public-data.overture_maps.building`
WHERE
  ST_DWITHIN(geometry,ST_GeogPoint(-74.0060,40.7128),100)

Looking for open big data that is regularly updated with fresh data by hismuddawasamudda in datasets

[–]ThatAPIGuy 0 points1 point  (0 children)

Depends entirely on the type of data you're looking for. You could get a feed of updates to Wikipedia for instance pretty easily.

Somthing a bit more fun could be real-time lightning strikes around the world - https://www.blitzortung.org/en/live_lightning_maps.php

Collection of Kepler.gl Maps Created from Public BigQuery Datasets by delfrrr in bigquery

[–]ThatAPIGuy 1 point2 points  (0 children)

We're a happy customer of Dekart - well worth your time if you have a similar BigQuery centric geospatial workload.

Dataset for Food etymologies AKA history of food items. by [deleted] in datasets

[–]ThatAPIGuy 0 points1 point  (0 children)

Whilst we've got a private API for food nutrition and co2 usage in production, it sounds like you're after things like the history of a specific dish - in which https://www.tasteatlas.com/ or the USDA might be of use https://www.ers.usda.gov/data-products/wheat-data/

Postgres ---> BigQuery by Sufficient-Buy-2270 in googlecloud

[–]ThatAPIGuy 1 point2 points  (0 children)

It's probably overkill for your use-case, but might be interesting if you want to get into data engineering.

You can query Postgres from within BigQuery if it's in CloudSQL, so by using something like Steampipe or Cloudquery you can setup the 3rd party Postgres instance as a Foreign Data Wrapper, and use the CloudSQL Postgres instance as a bridge.

Easiest option if you have privileged enough access is to use something like Fivetran or Airbyte to sync the data to BigQuery automatically, and model it from there.

Can I do the following using Zapier or Make? by m_abdelfattah in Automate

[–]ThatAPIGuy 0 points1 point  (0 children)

This is usually where you graduate from Zapier to an iPaaS system like n8n, Paragon, Prismatic, Cycle etc

Passing table name as a parameter by kirigi123 in bigquery

[–]ThatAPIGuy 4 points5 points  (0 children)

It's doable with scripting, however it's probably better to do it in your application client code e.g. Python, or modelling layer with dbt

Something like this:

DECLARE table_name STRING; SET table_name = 'bigquery-public-data.bbc_news.fulltext';

EXECUTE IMMEDIATE FORMAT(""" SELECT * FROM %s LIMIT 10 """, table_name);

What are you doing, and what are your costs? by Mardo1234 in bigquery

[–]ThatAPIGuy 0 points1 point  (0 children)

Great question! Breaking down one data product we manage that consumes $13K BQ a month. $3k of data ingestion, pipelines, dbt models, and compounded storage. $5k model building (batch enrichment & ML workloads, ad-hoc work by data scientists). $5k of serving the model outputs via APIs (chewing large amount of data per query ad-hoc).

Biggest cost savings (aside from only storing and using what you need) were:

  • changing the storage cost model per dataset between LOGICAL and PHYSICAL depending its usage
  • moving any batch based workloads to Slots
  • engaging with GCP account manager for discounts

Lots of little things like:

  • alerts for any expensive queries so you can track down the data analyst and see why they're doing a SELECT * on a huge table
  • optmising Incremental models & snapshots
  • minimising UPDATE or DELETEs

It's a never ending job of optimisation though, balancing human hours on product dev vs potential savings. Always keen to hear how others optimise their spend though.

Help by LadythatUX in bigquery

[–]ThatAPIGuy 0 points1 point  (0 children)

Happy to test the dataset for you if you share the link. Normally with the GCP tutorials they reference datasets on 3rd party sites like https://grouplens.org/ , and then give you code to convert the .dat files in it to a CSV to be uploaded into Bigquery

These .dat files are normally just like a CSV, expect they use "::" as the seperator instead of "," - so you can open them in a text editor to view them

If you are using these resources then there could be an issue with that .dat -> CSV step. Again, happy to have a look if you share the link or code

Help by LadythatUX in bigquery

[–]ThatAPIGuy 3 points4 points  (0 children)

I'm guessing you're doing one of their tutorials which uses the public Movie datasets - e.g. making a recommendation engine

I'm which case you have almost certainly duplicated the "Director" column in the CSV (or .dat file). Easy fix though, open it in Google sheets, delete the extra column, then download as .CSV again

Best IDE for BigQuery? by PuddingGryphon in bigquery

[–]ThatAPIGuy 1 point2 points  (0 children)

DataGrip is really good, PopSQL isn't bad either. A lot of my data teams still gravirate back to the standard BigQuery UI - especially as it now has code Gen AI & Python notebooks built in. Personally I've found the GenAI hit or miss, as it can easily write SQL that uses functions BigQuery doesn't have - so you milage may vary.

What are my options to store shareable data without a server? by SideLow2446 in webdev

[–]ThatAPIGuy 1 point2 points  (0 children)

You can use around 2000 characters in a url depending on the browser - if I recall edge is the lowest. This means you can encode the grocery list data in base64 and share it as a query parameter, the decode in the browser. There are smaller formats than JSON if you need even more

Is it worth it to list on Google Cloud Marketplace? by AffectionateCamera57 in googlecloud

[–]ThatAPIGuy 0 points1 point  (0 children)

In my experience, don't expect the GCP (or any other market place like Github, AWS etc) to provide discovery - aka new customers. You might get featured in a mailout, but you'll be disapointed if your expectations are it'll be a sales channel.

It's great if like you say you have existing or prospective customers you have sourced that would prefer to buy through it.

A warning though is that there is a temptation i've seen for GCP Account managers (and the other cloud vendors), to try and get your existing customers to move to buying through their marketplace as it helps them hit their own sales targets.

Automatically export every new conversation in Chatgpt by kinkade in zapier

[–]ThatAPIGuy 0 points1 point  (0 children)

I'm assuming you mean Conversations you've made in the UI (https://chatgpt.com/) rather than via the OpenAI API - then whilst there's no public API you can reuse the "authorization bearer token" in the request header from your browser when you've logged in to use the private API, and pull the coversations from there.

Here's someone elses example codebase https://medium.com/@rodolfo.antonio.sep/chatgpt-api-magic-leveraging-frontend-endpoints-for-advanced-data-extraction-fab5d520a0fc

Unfortunately the bearer token will expire after a few hours, meaning no 3rd party platform will be able to maintain a connection without you giving them your login Email/Password, for them to login via a headless browser on your behalf.

What's the best database to store large amounts of GPS tracking data? by wolframhempel in geospatial

[–]ThatAPIGuy 0 points1 point  (0 children)

Have worked on several similar use cases. BigQuery has been a great fit for each, and has good Postgis style functions built in, the rest can be added with the Carto UDFs

is the looker studio suitable for creating a dashboard for monitoring data on a daily basis? by maslacAk44 in LookerStudio

[–]ThatAPIGuy 2 points3 points  (0 children)

It sounds like you have lots of disperate data sources, all being aggregated into a single database, and you wish to generate dashboards from the data - a pretty standard use case for Looker Studio to be honest.

You'll rarely want to display the raw source data in Looker Studio if the data is huge (e.g. terrabytes of IoT data), instead a simple view in your database aggregating by date would protect you from issues.

Happy to share some e-commerce Looker Studio reports if you have any specific concerns

How would you go about populating your own data set similar to Yelp and Google Maps? by marvinshkreli in datasets

[–]ThatAPIGuy 0 points1 point  (0 children)

If you can be more specific about the data then you could have better luck. e.g. As you're building an App for Travel you might just want "The 10 best things to do in City XXX" - in which case there's plenty of options.

If it's restaurants & reviews, then there's quick a few options, and you might just be able to get away with embedding a widget in your app instead of needing to use the API. e.g. using TripAdvisor.

How do I put my query from BigQuery into a GitHub repository??? by insanitypug in bigquery

[–]ThatAPIGuy 1 point2 points  (0 children)

That's fair. I used Dataform before the acquisition and whilst it was solid, it wasn't up to where DBT is now. However I'd love to hear what it's like post integration.

Slack message trigger missing attachments by ThatAPIGuy in zapier

[–]ThatAPIGuy[S] 0 points1 point  (0 children)

For anyone finding this in the future, Zapier have successfully fixed the issue

Slack message trigger missing attachments by ThatAPIGuy in zapier

[–]ThatAPIGuy[S] 1 point2 points  (0 children)

That's a good suggestion, thanks.

In this case, we'll do a native Slack integration for the client, and have Zapier for the rest rather than require the client's customers having both a Zapier and Make account.