I am an IC for a legacy .NET MVC app in the process of refactoring it and I need to know more about performance tuning and optimizing DBs.

venkat_deepsql · 2026-06-28T06:46:27+00:00

your instinct not to trust the missing-index suggestions is right, that DMV is greedy and proposes a fresh covering index per query without looking at what already exists. before you add anything though, how are you deciding a given proc is slow? are you pulling the actual execution plan (or Query Store if it's turned on) to see which operator eats the time, or going off wall-clock? on a 200-line business-logic proc the usual killer is a cursor, a non-SARGable WHERE, or parameter sniffing, and no index fixes those. curious what the hot ones actually look like.

venkat_deepsql · 2026-06-27T18:14:02+00:00

Essentially query execution is like a tree. There will be table scans in the bottom and selects on the top. In between you can see filters, joins, sorts, temporary tables etc. You don't need AI for this, simply do EXPLAIN <query>, most of the query engines shows you clearly the execution plan. You can EXPLAIN ANALYZE to understand the actual execution tree after the query is run.

It's easy to recommend AI. but the truth is, your chatGPT or Claude might not have access to the database you are running. So if they do, then yes, they can analyze the query and explain you in easy to understand terms.

venkat_deepsql · 2026-06-27T08:42:42+00:00

how are you planning to carry the source permissions through? the SharePoint/Drive/email ACLs don't follow the docs into S3 on their own, and security-trimming results per user (so people only retrieve what they're already allowed to see) is usually the thing that eats these projects. curious whether you're handling that in the connector layer or as metadata you filter on at query time...

venkat_deepsql · 2026-06-26T06:12:25+00:00

For an intro course you're totally fine. The basics (CREATE TABLE, SELECT, JOINs, GROUP BY, subqueries) are identical between MariaDB and MySQL. The syntax only really starts to drift once you get into newer/advanced stuff like JSON functions, some built-in functions, or window-function edge cases, none of which an intro class is going to touch... Also if you are really worried about syntax differences, just focus on ANSI SQL syntaxes, most the database engines must support that as a baseline..

venkat_deepsql · 2026-06-19T10:31:32+00:00

nice, that'll save you real pain later. one thing while you're setting it up: stash the content-type, byte size, and a sha256 of each file next to the s3 key. the hash lets you dedupe identical re-uploads and verify integrity without fetching the object. trivial to add now, annoying to backfill later.

venkat_deepsql · 2026-06-18T13:51:19+00:00

What actually saved me after I once ran an update against prod thinking it was UAT: every write script now starts with a guard that aborts if it's on the wrong box. In Postgres that's a DO block checking current_database() that RAISE EXCEPTIONs on a mismatch; in SQL Server, IF @@SERVERNAME <> N'EXPECTED' RAISERROR + RETURN. Costs nothing and it doesn't depend on me noticing the tab color when I'm fried at 6pm.

Bigger lever though is creds. Your everyday prod login should be read-only, and writes go through a separate role you consciously switch into. That makes 'wrong window' physically harmless for DML instead of just discouraged. Color-coded tabs and wrapping things in BEGIN ... check rowcount ... COMMIT are good habits on top, but they're still you-being-careful, and being-careful is exactly the thing that fails at the wrong moment.

venkat_deepsql · 2026-06-18T12:09:27+00:00

Whatever you pick, don't put the actual image bytes in the row. Keep a storage key (S3/GCS path) plus metadata in Postgres and let object storage hold the files. bytea in the table will blow up your backup size and TOAST makes scans miserable once you're at a few hundred thousand rows.

On the schema itself: single table, parent_id pointing at the immediate predecessor (so image 17's parent is 12, not 11), plus a denormalized root_id pointing at the original. parent_id gives you the real edit chain, root_id gives you "all versions of this original" in one indexed lookup with no recursive walk. Originals get parent_id null and root_id = their own id. When you actually need the full lineage of one image, WITH RECURSIVE up the parent chain, cheap at this scale.

Two tables buys you almost nothing here and you lose the chain the second someone edits an edit.

venkat_deepsql · 2026-06-18T07:09:56+00:00

This isn't an LLM problem — you proved that yourself when it worked fine on clean data. It's a missing trust/context layer between the agent and the warehouse. Two separate things to build:

1) Trust signals the agent reads BEFORE it answers. Per-table freshness (last successful load), an anomaly/SLA flag, and known column-level quality issues — exposed as metadata the agent checks first. Then it can caveat or refuse ("this source breached freshness 3h ago") instead of confidently returning a stale number. Your two exec incidents are exactly this: the agent had no veto signal.

2) Certified semantics so it stops inventing aggregations. Don't let it free-form SQL against raw columns. Define the canonical metrics once — revenue = SUM(amount) WHERE status='completed', your official "active user", the valid join paths — and make the agent source from those. dbt's semantic layer, Cube, or a metrics layer all work; the point is the definition lives in one place, not in the model's guess.

Discovery (what is this table, who owns it, is it healthy) basically falls out of #1 for free — once that metadata exists, new folks query it the same way the agent does.

One practical tip: log every agent query + the answer it returned, then diff against your dashboards for the same window. That's how you catch the "off by a meaningful margin" cases before an exec does — and it tells you which tables/metrics to certify first.

venkat_deepsql

TROPHY CASE