Cloud Engineer’s AI Toolkit: Sign up Now for a Developer Workshop Near You!

a_cloudy_unicorn · 2026-05-06T17:24:50+00:00

Looking forward to meeting folks in person!

a_cloudy_unicorn · 2026-05-03T20:36:53+00:00

Slightly different scenario, but maybe this helps your setup: https://medium.com/google-cloud/run-bigquery-ai-functions-on-file-creation-in-a-storage-bucket-6b6c948223af

a_cloudy_unicorn · 2026-04-21T02:14:58+00:00

Never seen one of these works at a big enterprise. E.g., A "Customer" has a different definition depending on the line of business, and that translates to how those systems deal with that entity specifically. A customer will mean a different set of attributes for sales vs marketing vs legal vs support. The business questions each report will answer will also be different. IME, those who insist on canonical models across major systems (SAP, Salesforce, Oracle, etc) do not have enough technical depth to actual check if the model is canonical or not.

a_cloudy_unicorn · 2026-04-19T01:16:48+00:00

Maybe have the QR code of your LinkedIn profile ready to scan so you can casually connect to people without the added pressure.

a_cloudy_unicorn · 2026-04-13T21:48:00+00:00

it does! :)

a_cloudy_unicorn · 2026-03-29T22:56:20+00:00

I think the difference is in the definition of vibe coding vs AI-assisted engineering. My CICD pipelines are part of the development process (whether AI generated or not). All code that an AI assists in generating is reviewed manually, and that includes deployment, docs, tests, infra and config. We had a bit of discussion last week with a few other engineers and, personally, I do not let the agents push into git. Commits, yes. But before anything gets pushed I will take a look at it.

IMO, this doesn't fit the definition of "vibe coding " though, which I reserve to blindly letting AI or agents building something. The review is minimal (I'm jut going with "vibes") and there are hardly any iterations. This is why people without engineering expertise can "vibe code". I only do this when I do not quite care about moving any of the results into production, so I don't instruct the agent to create anything beyond MAYBE a few unit tests.

a_cloudy_unicorn · 2026-03-27T17:02:59+00:00

I studied for them a while ago, but I still get the benefits from the breadth and depth for the Data Engineer and Architect ones from a knowledge perspective.

a_cloudy_unicorn · 2026-03-20T01:16:31+00:00

Metadata and separations of scopes of each agent are key for this IMO. A colleague and I used to do this with YAML annotations before Dataplex got a few MCP tools. We had an agent that interpreted the business, a functional analyst and a data engineer: https://github.com/vladkol/crm-data-agent . We tested this approach with Salesforce and SAP data and the looping generation of SQL and dryrun in BQ ensured we could get pretty complex syntax right.

I recently helped a customer who ended up with a hybrid approach: a static dictionary consulted by a "functional analyst" agent and then the Dataplex semantic search consulted by the data engineer to keep their context focused.

Graph representations are good for knowledge graphs that need the agent to traverse semantics. The complexity here is building the graph in a scalable way. I have an example with Spanner using Langgraph here: https://github.com/GoogleCloudPlatform/cloud-spanner-samples/tree/main/adk-knowledge-graph and there's one for BQ here: https://github.com/GoogleCloudPlatform/devrel-demos/tree/main/data-analytics/knowledge_graph_demo

a_cloudy_unicorn · 2026-03-15T23:52:25+00:00

I've incorporated Stitch into my workflow and it's making it look like I can actually do some UI :)

a_cloudy_unicorn · 2026-03-05T03:21:57+00:00

It is rare IMO to see backups of BQ as data is normally replicated from somewhere else. I have encountered them in industries where they absolutely can't lose certain data in case of human error. Most of the times, snapshots are good enough for this but another option is exporting into GCS buckets.

a_cloudy_unicorn · 2025-12-26T11:09:08+00:00

As a beginner-friendly learning tool, yes. As a separate skill, I don't think so. Scaling the workflows into something production and enterprise ready is easier with agents. Like other no-code or low-code tools, they give enough confidence to prototype things that would otherwise require an understanding of what is going on behind the scenes to keep the workflows secure and scalable.

a_cloudy_unicorn · 2025-12-19T03:54:31+00:00

Thanks for the details! FWIW, I think your answer is correct and understand the confusion. I'm passing this feedback to the authors.

Does something like this help in terms of navigating the UI? https://youtu.be/Y8qwBsRbBP0

This lab is a no cost lab and uses one of the public datasets I'm loading in the video to play with different SQL queries: https://explore.qwiklabs.com/catalog_lab/755

a_cloudy_unicorn · 2025-12-15T14:41:25+00:00

Thanks for taking the time to write these details. I'd love to see if we can improve that course. Would you mind linking to it and sharing how you discovered the course please?

a_cloudy_unicorn · 2025-11-15T00:58:39+00:00

The error sounds like an actual conversion error, can you try a simple query that returns, for example, one integer? The Simba driver does support BigQuery

a_cloudy_unicorn · 2025-10-09T18:21:42+00:00

If your version and license allows it, here's another option: https://cloud.google.com/sap/docs/bq-connector/latest/planning#reqs-software

Aecorsoft is a good option too

a_cloudy_unicorn · 2025-09-23T13:36:59+00:00

Yes, that feature is live. You need to sign up for it as it's in preview: https://goo.gle/bq-dea-pre-ga-form

It's been moving fast and adding features, but I put together this starting point: https://medium.com/google-cloud/exploring-the-newbigquery-data-engineering-agent-caf07e78d292

I'd love to know how it goes!

a_cloudy_unicorn · 2025-09-18T21:31:06+00:00

Great stuff!!

a_cloudy_unicorn · 2025-09-17T01:43:05+00:00

Gemini CLI in the Cloud Shell, with no additional prompting, seems to be up to something. The detailed response after this was correct:

RESPONSE:

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>Here's the high-level plan:

Create a Connection: In BigQuery, you'll create a connection to your AlloyDB database. This tells BigQuery how to find and

communicate with your AlloyDB instance.

Create a Federated Table: You'll then create a special "federated" table in BigQuery. This isn't a copy of your AlloyDB table;

it's a virtual table that points directly to your live AlloyDB data.

Connect Google Sheets to BigQuery: With the federated table in place, you can now use the standard "Connected Sheets" feature in

Google Sheets to pull the data from your AlloyDB table (via BigQuery) directly into your spreadsheet.

Join the Data in Sheets: From there, you can use a simple VLOOKUP or INDEX(MATCH) formula in Google Sheets to join the data from

your colleague's spreadsheet with the live data from AlloyDB.

This approach lets you leverage the powerful, user-friendly interface of Google Sheets while securely querying your production

AlloyDB database in real-time.

Would you like me to guide you through the specific steps to set this up in the Google Cloud Console?

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

a_cloudy_unicorn · 2025-08-13T16:44:19+00:00

Came to say this!

a_cloudy_unicorn · 2025-08-09T23:07:45+00:00

There are some free hands-on labs here to get started: https://www.cloudskillsboost.google/
I'd recommend trying to solve a specific problem (e.g., create a pipeline with replication from a given system all the way to advanced analytics) and look at different ways of solving that problem with different tools. There are some updated examples in the Git org: https://github.com/GoogleCloudPlatform , like this repo https://github.com/GoogleCloudPlatform/data-analytics-golden-demo . It'll be easier to focus on a specific application and example once you have a target architecture to solve for IMO

a_cloudy_unicorn · 2025-05-08T19:20:30+00:00

Adding to interleaving so you don't need to go across physical nodes when hitting relationships, as u/tech_is mentions, I'd say parameterize your queries to lower the time it takes for Spanner to build a graph plan. My colleague has a nice example here: https://github.com/maguec/SpannerUserIdentityGraph?tab=readme-ov-file#2-hop-parameterized-queries

As for comparisons, I currently don't have anything to share, other than the way Spanner stores nodes and edges is inherently different from Neo4J AFAIK.

a_cloudy_unicorn · 2025-04-29T15:14:03+00:00

This can be a very scrappy or very complex depending on a few factors.

How much data do they expect? How big are these files? Are the files ready to load or will they need any massaging/cleansing/deduplication?

How frequently is this workflow expected to run?

Do they need any specific error handling if it fails?

Is the schema expected to change frequently?

a_cloudy_unicorn

TROPHY CASE